K近邻思想: 根据你的"邻居们"来确定你的类别

# 案例1

```from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd

def knncls():

"""
预测电影分类
:return:
"""
# 提取特征值, 目标值
x = data.drop(["type", "movie_name"], axis=1)
y = data["type"]
# 分割数据集
x_train, x_test, y_train, y_test =train_test_split(x, y, test_size=0.25)

# 通过knn进行预测
knn = KNeighborsClassifier()

knn.fit(x_train, y_train)

y_predict = knn.predict(x_test)
print(x_test, "的预测结果为:", y_predict)

print("预测准确率为:", knn.score(x_test, y_test))

if __name__ == '__main__':
knncls()```
```movie_name,fight,kiss,type
California Man,3,104,1
He's not Really into dues,2,100,1
Beautiful Woman,1,81,1
Robo Slayer 3000,99,5,2
Amped II,98,2,2
unname,18,90,1
vampire,90,15,2```

```from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

def knncls():
"""
:return:
"""
# 使用pandas读取100000数据
train_data = pd.read_csv("./data/fb/train.csv", nrows = 100000)

# 特征工程
# 1.缩小x,y的范围
train_data = train_data.query("x>1.0 & x<1.5 & y>1.0 & y<2.5")

# 2.解析时间戳
time_value = pd.to_datetime(train_data["time"], unit="s")
time_value = pd.DatetimeIndex(time_value)

# 3.添加特征(时间)
train_data["weekday"] = time_value.weekday
train_data["year"] = time_value.day
train_data["hour"] = time_value.hour
train_data["minute"] = time_value.minute

# 4.删除特征(时间戳)
train_data = train_data.drop(["time"], axis=1)

# 5.只保留入住人数大于5的place,生成新的train_data
place_count = train_data.groupby("place_id").count()
place_count_r = place_count[place_count.row_id > 3].reset_index()
train_data = train_data[train_data["place_id"].isin(place_count_r["place_id"])]

# 提取特征值和目标值
x = train_data.drop(["place_id", "row_id"], axis=1)

y = train_data["place_id"]

# 分割数据集
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

# 进行标准化
std = StandardScaler()

x_train = std.fit_transform(x_train)
x_test = std.transform(x_test)

# 实例化knn估计器
knn = KNeighborsClassifier()

knn.fit(x_train, y_train)

# 预测结果
y_predict = knn.predict(x_test)

# 打印准确率
print("准确率为:",knn.score(x_test, y_test))

return None

if __name__ == '__main__':
knncls()```

0 条评论

## 相关文章

### 【论文推荐】最新5篇知识图谱相关论文—强化学习、习知识图谱的表示、词义消除歧义、并行翻译嵌入、图数据库

【导读】专知内容组整理了最近五篇知识图谱（Knowledge Graph）相关文章，为大家进行介绍，欢迎查看! 1. DeepPath: A Reinforce...

48040

### 【论文推荐】最新六篇行人再识别相关论文—特定视角、多目标、双注意匹配网络、联合属性-身份、迁移学习、多通道金字塔型

【导读】专知内容组整理了最近六篇行人再识别（Person Re-Identification）相关文章，为大家进行介绍，欢迎查看! 1. Learning Vi...

88750

20420

47180

20710

### 【干货】Python大数据处理库PySpark实战——使用PySpark处理文本多分类问题

【导读】近日，多伦多数据科学家Susan Li发表一篇博文，讲解利用PySpark处理文本多分类问题的详情。我们知道，Apache Spark在处理实时数据方面...

13.1K100

45390

18040

35530

### 【学习】常用的机器学习&数据挖掘知识点

Basis(基础)： MSE(Mean Square Error 均方误差)，LMS(LeastMean Square 最小均方)，LSM(Least Squa...

381120