前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >python实现kNN(最近邻)

python实现kNN(最近邻)

作者头像
西西嘛呦
发布2020-08-26 16:41:58
6940
发布2020-08-26 16:41:58
举报

什么是最近邻?

最近邻可以用于分类和回归,这里以分类为例。给定一个训练集,对新输入的实例,在训练数据集中找到与该实例最接近的k个实例,这k个实例的多数属于某个类,就把该输入实例分为这个类

最近邻模型的三个基本要素?

距离度量、K值的选择和分类决策规则。

距离度量:一般是欧式距离,也可以是Lp距离和曼哈顿距离。

下面是一个具体的例子:

k值怎么选择?

接下来是代码实现:

代码语言:javascript
复制
from __future__ import print_function, division
import numpy as np
from mlfromscratch.utils import euclidean_distance

class KNN():
    """ K Nearest Neighbors classifier.

    Parameters:
    -----------
    k: int
        The number of closest neighbors that will determine the class of the 
        sample that we wish to predict.
    """
    def __init__(self, k=5):
        self.k = k

    def _vote(self, neighbor_labels):
        """ Return the most common class among the neighbor samples """
        counts = np.bincount(neighbor_labels.astype('int'))
        return counts.argmax()

    def predict(self, X_test, X_train, y_train):
        y_pred = np.empty(X_test.shape[0])
        # Determine the class of each sample
        for i, test_sample in enumerate(X_test):
            # Sort the training samples by their distance to the test sample and get the K nearest
            idx = np.argsort([euclidean_distance(test_sample, x) for x in X_train])[:self.k]
            # Extract the labels of the K nearest neighboring training samples
            k_nearest_neighbors = np.array([y_train[i] for i in idx])
            # Label sample as the most common class label
            y_pred[i] = self._vote(k_nearest_neighbors)

        return y_pred
        

其中一些numpy中的函数用法:

numpy.bincount()

numpy.argmax():

numpy.argsort():返回排序后数组的索引

接着是其中使用到了euclidean_distance():

代码语言:javascript
复制
def euclidean_distance(x1, x2):
    """ Calculates the l2 distance between two vectors """
    distance = 0
    # Squared distance between each coordinate
    for i in range(len(x1)):
        distance += pow((x1[i] - x2[i]), 2)
    return math.sqrt(distance)

这里使用的是l2距离。

运行的主函数:

代码语言:javascript
复制
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

from mlfromscratch.utils import train_test_split, normalize, accuracy_score
from mlfromscratch.utils import euclidean_distance, Plot
from mlfromscratch.supervised_learning import KNN

def main():
    data = datasets.load_iris()
    X = normalize(data.data)
    y = data.target
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

    clf = KNN(k=5)
    y_pred = clf.predict(X_test, X_train, y_train)
    
    accuracy = accuracy_score(y_test, y_pred)

    print ("Accuracy:", accuracy)

    # Reduce dimensions to 2d using pca and plot the results
    Plot().plot_in_2d(X_test, y_pred, title="K Nearest Neighbors", accuracy=accuracy, legend_labels=data.target_names)


if __name__ == "__main__":
    main()

结果:

Accuracy: 0.9795918367346939

理论知识:来自统计学习方法

代码来源:https://github.com/eriklindernoren/ML-From-Scratch

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2020-05-04 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档