专栏首页大数据文摘Python和R代码机器学习算法速查对比表

Python和R代码机器学习算法速查对比表

翻译:丁雪 校对:王方思

在拿破仑·希尔(Napolean Hill)所著的《思考致富》(Think and Grow Rich)一书中,他为我们引述了Darby苦挖金矿多年后,就在离矿脉一步之遥的时候与宝藏失之交臂的故事。

思考致富中文版的豆瓣阅读链接:

http://read.douban.com/reader/ebook/10954762/

根据该书内容进行的修改

如今,我虽然不知道这故事是真是假,但是我明确知道在我身边有不少这样的“数据Darby”。这些人了解机器学习的目的和执行,对待任何研究问题只使用2-3种算法。他们不用更好的算法和技术来更新自身,只因为他们太顽固,或者他们只是在耗费时间而不求进步。

像Darby这一类人,他们总是在接近终点的时候而错失良机。最终,他们以计算量大、难度大或是无法设定合适的阈值来优化模型等借口,放弃了机器学习。这有什么意义?你听说过这些人吗?

今天给出的速查表旨在改变这群“数据Darby”对机器学习的态度,使他们成为身体力行的倡导者。这里收集了10个最为常用的机器学习算法,附上了Python和R代码。

考虑到机器学习方法在建模中得到了更多的运用,以下速查表可以作为代码指南来帮助你掌握机器学习算法运用。祝你好运!

对于那些超级懒惰的数据Darbies,我们将让你的生活过得更轻松。你可以在此下载PDF版的速查表,便可直接复制粘贴代码。

机器学习算法

类 型

监督学习

非监督学习

增强学习

决策树K-近邻算法随机决策森林Logistics回归分析

Apriori算法K-均值算法系统聚类

马尔科夫决策过程增强学习算法(Q-学习)

线性回归

#Import Library#Import other necessary libraries like pandas,#numpy...from sklearn import linear_model#Load Train and Test datasets#Identify feature and response variable(s) and#values must be numeric and numpy arraysx_train=input_variables_values_training_datasets y_train=target_variables_values_training_datasets x_test=input_variables_values_test_datasets#Create linear regression objectlinear = linear_model.LinearRegression()#Train the model using the training sets and #check scorelinear.fit(x_train, y_train) linear.score(x_train, y_train)#Equation coefficient and Intercept print('Coefficient: \n', linear.coef_) print('Intercept: \n', linear.intercept_) #Predict Outputpredicted= linear.predict(x_test)

#Load Train and Test datasets#Identify feature and response variable(s) and#values must be numeric and numpy arraysx_train <- input_variables_values_training_datasetsy_train <- target_variables_values_training_datasetsx_test <- input_variables_values_test_datasetsx <- cbind(x_train,y_train)#Train the model using the training sets and#check scorelinear <- lm(y_train ~ ., data = x)summary(linear)#Predict Outputpredicted= predict(linear,x_test)

逻辑回归

#Import Libraryfrom sklearn.linear_model import LogisticRegression#Assumed you have, X (predictor) and Y (target)#for training data set and x_test(predictor)#of test_dataset#Create logistic regression objectmodel = LogisticRegression()#Train the model using the training sets#and check scoremodel.fit(X, y)model.score(X, y)#Equation coefficient and Interceptprint('Coefficient: \n', model.coef_)print('Intercept: \n', model.intercept_)#Predict Outputpredicted= model.predict(x_test)

x <- cbind(x_train,y_train)#Train the model using the training sets and check #scorelogistic <- glm(y_train ~ ., data = x,family='binomial') summary(logistic)#Predict Outputpredicted= predict(logistic,x_test)

决策树

#Import Library#Import other necessary libraries like pandas, numpy... from sklearn import tree#Assumed you have, X (predictor) and Y (target) for#training data set and x_test(predictor) of #test_dataset#Create tree objectmodel = tree.DecisionTreeClassifier(criterion='gini') #for classification, here you can change the #algorithm as gini or entropy (information gain) by#default it is gin#model = tree.DecisionTreeRegressor() for#regression#Train the model using the training sets and check #scoremodel.fit(X, y)model.score(X, y)#Predict Outputpredicted= model.predict(x_test)

#Import Librarylibrary(rpart)x <-cbind(x_train,y_train)#grow treefit <- rpart(y_train ~ ., data = x,method="class") summary(fit)#Predict Outputpredicted= predict(fit,x_test)

支持向量机

#Import Libraryfrom sklearn import svm#Assumed you have, X (predictor) and Y (target) for #training data set and x_test(predictor) of test_dataset#Create SVM classification objectmodel = svm.svc()#there are various options associatedwith it, this is simple for classification.#Train the model using the training sets and check #scoremodel.fit(X, y)model.score(X, y)#Predict Outputpredicted= model.predict(x_test)

#Import Librarylibrary(e1071)x <- cbind(x_train,y_train) #Fitting modelfit <-svm(y_train ~ ., data = x) summary(fit)#Predict Outputpredicted= predict(fit,x_test)

贝叶斯算法

#Import Libraryfrom sklearn.naive_bayes import GaussianNB#Assumed you have, X (predictor) and Y (target) for#training data set and x_test(predictor) of test_dataset#Create SVM classification object model = GaussianNB()#there is other distribution for multinomial classes like Bernoulli Naive Bayes#Train the model using the training sets and check#scoremodel.fit(X, y)#Predict Outputpredicted= model.predict(x_test)

#Import Librarylibrary(e1071)x <- cbind(x_train,y_train)#Fitting modelfit <-naiveBayes(y_train ~ ., data = x) summary(fit)#Predict Outputpredicted= predict(fit,x_test)

k-近邻算法析

#Import Libraryfrom sklearn.neighbors import KNeighborsClassifier#Assumed you have, X (predictor) and Y (target) for#training data set and x_test(predictor) of test_dataset#Create KNeighbors classifier object model KNeighborsClassifier(n_neighbors=6)#default value for n_neighbors is 5#Train the model using the training sets and check score model.fit(X, y)#Predict Outputpredicted= model.predict(x_test)

#Import Librarylibrary(knn)x <- cbind(x_train,y_train)#Fitting modelfit <-knn(y_train ~ ., data = x,k=5) summary(fit)#Predict Outputpredicted= predict(fit,x_test)

硬聚类算法

#Import Libraryfrom sklearn.cluster import KMeans#Assumed you have, X (attributes) for training data set#and x_test(attributes) of test_dataset#Create KNeighbors classifier object modelk_means = KMeans(n_clusters=3, random_state=0)#Train the model using the training sets and check score model.fit(X)#Predict Outputpredicted= model.predict(x_test)

#Import Librarylibrary(cluster)fit <- kmeans(X, 3)#5 cluster solution

随机森林算法

#Import Libraryfrom sklearn.ensemble import RandomForestClassifier#Assumed you have, X (predictor) and Y (target) for#training data set and x_test(predictor) of test_dataset#Create Random Forest objectmodel= RandomForestClassifier()#Train the model using the training sets and check score model.fit(X, y)#Predict Outputpredicted= model.predict(x_test)

#Import Librarylibrary(randomForest)x <- cbind(x_train,y_train)#Fitting modelfit <- randomForest(Species ~ ., x,ntree=500) summary(fit)#Predict Outputpredicted= predict(fit,x_test)

降维算法

#Import Libraryfrom sklearn import decomposition#Assumed you have training and test data set as train and#test#Create PCA object pca= decomposition.PCA(n_components=k) #default value of k =min(n_sample, n_features)#For Factor analysis#fa= decomposition.FactorAnalysis()#Reduced the dimension of training dataset using PCA train_reduced = pca.fit_transform(train)#Reduced the dimension of test datasettest_reduced = pca.transform(test)

#Import Librarylibrary(stats)pca <- princomp(train, cor = TRUE)train_reduced <- predict(pca,train)test_reduced <- predict(pca,test)

GBDT

#Import Libraryfrom sklearn.ensemble import GradientBoostingClassifier#Assumed you have, X (predictor) and Y (target) for#training data set and x_test(predictor) of test_dataset#Create Gradient Boosting Classifier objectmodel= GradientBoostingClassifier(n_estimators=100, \ learning_rate=1.0, max_depth=1, random_state=0)#Train the model using the training sets and check score model.fit(X, y)#Predict Outputpredicted= model.predict(x_test)

#Import Librarylibrary(caret)x <- cbind(x_train,y_train)#Fitting modelfitControl <- trainControl( method = "repeatedcv", + number = 4, repeats = 4)fit <- train(y ~ ., data = x, method = "gbm",+ trControl = fitControl,verbose = FALSE)predicted= predict(fit,x_test,type= "prob")[,2]

本文分享自微信公众号 - 大数据文摘(BigDataDigest),作者:大数据文摘

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2015-12-02

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 卫星数据+算法给世界上每座发电厂加台监控,“切尔诺贝利式”信息壁垒可以被打破吗?

    一直以来,资本与政治两相博弈,大量发电厂污染排放物监测困难,精确的数据获取无门,民众和NGO等公益组织也有心无力。

    大数据文摘
  • 谷歌斯坦福等联手推进ML基准:AI性能评估系统MLPerf新版出炉

    去年,谷歌、斯坦福、伯克利、哈佛、百度、英特尔、AMD等40多家科技企业联合发布了一款致力于衡量机器学习性能的通用标准的系统。

    大数据文摘
  • AI界最危险武器GPT-2使用指南:从Finetune到部署

    今早一起床就看到François Chollet大神(Keras作者)发推,根据 GPT-2中量模型的超长距离记忆想到了一种简单的不基于机器学习的文本生成方式,...

    大数据文摘
  • 重要的机器学习算法

    关键词:机器学习,算法 正文: 本文旨在为那些获取关于重要机器学习概念知识的人们提供一些机器学习算法,同时免费提供相关的材料和资源。并且附上相关算法的程序实现...

    小莹莹
  • React Native  APP 添加自动更新

    添加APP的自动,使用的第三方库是:react-native-code-push。新版本使用起来特别简单。添加步骤如下:

    贺贺V5
  • 【DS】Doc2Vec和Logistic回归的多类文本分类

    Doc2vec是一个NLP工具,用于将文档表示为向量,是word2vec方法的推广。 为了理解doc2vec,最好理解word2vec方法。但是,完整的数学细节...

    陆勤_数据人网
  • 死磕Synchronized底层实现

    关于synchronized的底层实现,网上有很多文章了。但是很多文章要么作者根本没看代码,仅仅是根据网上其他文章总结、照搬而成,难免有些错误;要么很多点都是一...

    java思维导图
  • 死磕Synchronized底层实现,面试你还怕什么?

    关于synchronized的底层实现,网上有很多文章了。但是很多文章要么作者根本没看代码,仅仅是根据网上其他文章总结、照搬而成,难免有些错误;要么很多点都是一...

    李红
  • 死磕Synchronized底层实现

    关于synchronized的底层实现,网上有很多文章了。但是很多文章要么作者根本没看代码,仅仅是根据网上其他文章总结、照搬而成,难免有些错误;要么很多点都是一...

    Java团长
  • 旷视发布基准数据集 CrowdHuman,用于人群中的人类检测

    雷锋网 AI 研习社按,近期,旷视发布了一个叫做 CrowdHuman 的基准数据集,该数据集可用于人群中的人类检测。

    AI研习社

扫码关注云+社区

领取腾讯云代金券