# 10 种机器学习算法的要点

1、 监督式学习

2、非监督式学习

3、强化学习

SVM

K最近邻算法

K均值算法

1、线性回归

Y：因变量

a：斜率

x：自变量

b ：截距

Python 代码

#Import Library

#Import other necessary libraries like pandas, numpy...

fromsklearnimportlinear_model

#Identify feature and response variable(s) and values must be numeric and numpy arrays

x_train=input_variables_values_training_datasets

y_train=target_variables_values_training_datasets

x_test=input_variables_values_test_datasets

# Create linear regression object

linear=linear_model.LinearRegression()

# Train the model using the training sets and check score

linear.fit(x_train,y_train)

linear.score(x_train,y_train)

#Equation coefficient and Intercept

print('Coefficient: n',linear.coef_)

print('Intercept: n',linear.intercept_)

#Predict Output

predicted=linear.predict(x_test)

R代码

#Identify feature and response variable(s) and values must be numeric and numpy arrays

x_train

y_train

x_test

x

# Train the model using the training sets and check score

linear

summary(linear)

#Predict Output

predicted=predict(linear,x_test)

2、逻辑回归

odds=p/(1-p)=probability of eventoccurrence/probability ofnotevent occurrence

ln(odds)=ln(p/(1-p))

logit(p)=ln(p/(1-p))=b0+b1X1+b2X2+b3X3....+bkXk

Python代码

#Import Library

fromsklearn.linear_modelimportLogisticRegression

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset

# Create logistic regression object

model=LogisticRegression()

# Train the model using the training sets and check score

model.fit(X,y)

model.score(X,y)

#Equation coefficient and Intercept

print('Coefficient: n',model.coef_)

print('Intercept: n',model.intercept_)

#Predict Output

predicted=model.predict(x_test)

R代码

x

# Train the model using the training sets and check score

logistic

summary(logistic)

#Predict Output

predicted=predict(logistic,x_test)

3、决策树

（来源： statsexchange）

Python代码

#Import Library

#Import other necessary libraries like pandas, numpy...

fromsklearnimporttree

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset

# Create tree object

model=tree.DecisionTreeClassifier(criterion='gini')# for classification, here you can change the algorithm as gini or entropy (information gain) by default it is gini

# model = tree.DecisionTreeRegressor() for regression

# Train the model using the training sets and check score

model.fit(X,y)

model.score(X,y)

#Predict Output

predicted=model.predict(x_test)

R代码

library(rpart)

x

# grow tree

fit

summary(fit)

#Predict Output

predicted=predict(fit,x_test)

4、支持向量机

Python代码

#Import Library

fromsklearnimportsvm

#Assumed you have, X (predic

tor)andY(target)fortraining data setandx_test(predictor)oftest_dataset

# Create SVM classification object

model=svm.svc()# there is various option associated with it, this is simple for classification. You can refer link, for mo# re detail.

# Train the model using the training sets and check score

model.fit(X,y)

model.score(X,y)

#Predict Output

predicted=model.predict(x_test)

R代码

library(e1071)

x

# Fitting model

fit

summary(fit)

#Predict Output

predicted=predict(fit,x_test)

5、朴素贝叶斯

P(c|x) 是已知预示变量（属性）的前提下，类（目标）的后验概率

P(c) 是类的先验概率

P(x|c) 是可能性，即已知类的前提下，预示变量的概率

P(x) 是预示变量的先验概率

Python代码

#Import Library

fromsklearn.naive_bayesimportGaussianNB

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset

# Create SVM classification object model = GaussianNB() # there is other distribution for multinomial classes like Bernoulli Naive Bayes, Refer link

# Train the model using the training sets and check score

model.fit(X,y)

#Predict Output

predicted=model.predict(x_test)

R代码

library(e1071)

x

# Fitting model

fit

summary(fit)

#Predict Output

predicted=predict(fit,x_test)

6、KNN（K – 最近邻算法）

KNN 的计算成本很高。

Python代码

#Import Library

fromsklearn.neighborsimportKNeighborsClassifier

#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset

# Create KNeighbors classifier object model

KNeighborsClassifier(n_neighbors=6)# default value for n_neighbors is 5

# Train the model using the training sets and check score

model.fit(X,y)

#Predict Output

predicted=model.predict(x_test)

R代码

library(knn)

x

# Fitting model

fit

summary(fit)

#Predict Output

predicted=predict(fit,x_test)

7、K 均值算法

K – 均值算法是一种非监督式学习算法，它能解决聚类问题。使用 K – 均值算法来将一个数据归入一定数量的集群（假设有 k 个集群）的过程是简单的。一个集群内的数据点是均匀齐次的，并且异于别的集群。

K – 均值算法怎样形成集群：

K – 均值算法给每个集群选择k个点。这些点称作为质心。

K – 均值算法涉及到集群，每个集群有自己的质心。一个集群内的质心和各数据点之间距离的平方和形成了这个集群的平方值之和。同时，当所有集群的平方值之和加起来的时候，就组成了集群方案的平方值之和。

Python代码

#Import Library

fromsklearn.clusterimportKMeans

#Assumed you have, X (attributes) for training data set and x_test(attributes) of test_dataset

# Create KNeighbors classifier object model

k_means=KMeans(n_clusters=3,random_state=)

# Train the model using the training sets and check score

model.fit(X)

#Predict Output

predicted=model.predict(x_test)

R代码

library(cluster)

fit

8、随机森林

• 发表于:
• 原文链接http://kuaibao.qq.com/s/20180219A0OYN300?refer=cp_1026
• 腾讯「云+社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。

2020-04-02

2020-04-02

2020-04-02

2020-04-02

2020-04-02

2020-04-02

2018-04-12

2020-04-02

2020-04-02

2020-04-02

2020-04-02

2020-04-02