Machine Learning in Action（2）——a simple KNN algorithm

文章来源：企鹅号 - 说疯话的小聋瞎

Machine Learning in Action (2) —— simple KNN algorithm

1. KNN —— k-NearestNeighbors

2. KNN algorithm works like this:

We have an existing set of example data, our training set. We have labels for all of these data—we know what class each piece of the data should fall into.

When we’re given a new piece of data without a label, we compare that new piece of data to every piece of data in the training set.

We then take the most similar pieces of data (the nearest neighbors) and look at their labels. We look at the top k most similar pieces of data from our known data set. This is where the k comes from. (k is an integer and it’s usually less than 20.)

And how to compare the new data with each piece of training data ? we can think each data as a multidimensional point and use the Euclidean distance of two points to stand for the similarity between them. The shorter the distance is, the more similar.

Lastly, we take a majority vote from the k most similar pieces of data, and the majority is the new class we assign to the data we were asked to classify.

3. Preparation about python before implementation of the algorithm.

(1) Import data with python

We need package Numpy ;

We need module operator

4. Steps of a simple KNN algorithm

(1) We should have a training data set, a label set including labels for each training example in the training data set and a piece of new data to be classified.

(2) Calculate the Euclidean distance between the new data and each training example and sort them from small to large.

(3) Find the labels of the top k items and sort the labels according to the frequency of occurrence.

(4) Pick the label that occurs most frequently as the label of the newly given point.

(5) Test the classifier through calculating the error rate.

5. Implementation of the simple KNN algorithm:

(1) Parameters :

Data set :

data collected before the algorithm including many pieces of data, each piece has values for each feature.

Label set:

class label of each piece of data in the data set. For example, the training example dataSet[0] belongs to class labels[0]

K :

In the algorithm we should choose the top k similar pieces of data.

inputData :

a new piece of data which need to be labeled by applying the KNN algorithm.

(2) implementation:

(3) Annotations about some functions of python:

*constant shape can be used to calculate the size of an array or a matrix, but not a list or a tuple.for example:

the type of array.shape is , the first element of the tuple is the number of rows, the second element is the number of columns.

*function tile(object, (row, column)) can be used to repeat an iterable object, and the return type is array

*array **2 is different from matrix **2 , array **2 is element- wise, matrix **2 stands for matrix product.

*function sum(axis=0/1) can be used to calculate the sum of a matrix or an array along a dimension, 0 stands for column, 1 stands for row.

*function argsort() can be used to sort an array or a matrix along a specific axis and returns an index array that can be used to sort the array.

*dict.get(key, default=None) return the value according to the key, if thekey doesn't exist then return the default value. But unless we set the key for the dictionary explicitly the key won't add to the dictionary automatically by this function.

*two ways to sort a list or a dict :

A. function sort(key=None, reverse=False) of list.

B. built-in function sorted(iterable, key=None, reverse=False)

Notice: in the functions above, parameter key is a function but not a value, before sorting, we should use this function to choose objects involved in sorting from the iterable object, for example, for a 2darray, we may want to sort this array along the 2nd dimension, in this case we need a function here to choose 2nd element in the array.

for example: sorted(dict.keys) can sort a dictionary object according to the keys of the dictionary

*dict.items() , dict.keys(), dict.values() return iterable objects of a dictionary object.

*Operator.itemgetter(a1, a2) will return a function, the parameter of this function is an iterable object , it stands for get the (a1)th and (a2)th domain value of the object.

so we can use combine itemgetter(a1,a2) with sorted() to achieve multilevel sorting.

a detailed explanation about the function sorted and operator.itemgetter():https://zhidao.baidu.com/question/2270484067768455068.htmlfr=iks&word=python+operator.itemgetter%28%29&ie=gbk

发表于: 2018-07-092018-07-09 09:02:58
原文链接：https://kuaibao.qq.com/s/20180709G0B6EB00?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

Machine Learning in Action（2）——a simple KNN algorithm

相关快讯

扫码

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐