# 独家 | 决策树VS随机森林——应该使用哪种算法？（附代码&链接）

• 决策树简介
• 随机森林概览
• 随机森林和决策树的冲突（代码）
• 为什么随机森林优于决策树？
• 决策树vs随机森林——你应该在何时选择何种算法？

“为什么决策树会先检测信用得分而不是收入呢？”

• 基于树的算法：从零开始的完整教程(R & Python)

https://www.analyticsvidhya.com/blog/2016/04/tree-based-algorithms-complete-tutorial-scratch-in-python/?utm_source=blog&utm_medium=decision-tree-vs-random-forest-algorithm

• 从决策树开始(免费课程)

https://courses.analyticsvidhya.com/courses/getting-started-with-decision-trees?utm_source=blog&utm_medium=decision-tree-vs-random-forest-algorithm

“随机森林算法用多棵（随机生成的）决策树来生成最后的输出结果。”

• 从零开始构建一个随机森林&理解真实世界的数据产品

https://www.analyticsvidhya.com/blog/2018/12/building-a-random-forest-from-scratch-understanding-real-world-data-products-ml-for-programmers-part-3/?utm_source=blog&utm_medium=decision-tree-vs-random-forest-algorithm

• 随机森林超参数调优——一个初学者的指南

https://www.analyticsvidhya.com/blog/2020/03/beginners-guide-random-forest-hyperparameter-tuning/?utm_source=blog&utm_medium=decision-tree-vs-random-forest-algorithm

• 集成学习的综合指南(使用Python代码)

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/?utm_source=blog&utm_medium=decision-tree-vs-random-forest-algorithm

• 如何在机器学习中建立集成模型?( R代码)

https://www.analyticsvidhya.com/blog/2017/02/introduction-to-ensembling-along-with-implementation-in-r/?utm_source=blog&utm_medium=decision-tree-vs-random-forest-algorithm

https://www.analyticsvidhya.com/blog/2016/07/practical-guide-data-preprocessing-python-scikit-learn/?utm_source=blog&utm_medium=decision-tree-vs-random-forest-algorithm

```# Data Preprocessing and null values imputation

# Label Encoding

df['Gender']=df['Gender'].map({'Male':1,'Female':0})

df['Married']=df['Married'].map({'Yes':1,'No':0})

df['Dependents'].replace('3+',3,inplace=True)

df['Self_Employed']=df['Self_Employed'].map({'Yes':1,'No':0})

df['Property_Area']=df['Property_Area'].map({'Semiurban':1,'Urban':2,'Rural':3})

df['Loan_Status']=df['Loan_Status'].map({'Y':1,'N':0})

#Null Value Imputation

rev_null=['Gender','Married','Dependents','Self_Employed','Credit_History','LoanAmount','Loan_Amount_Term']

df[rev_null]=df[rev_null].replace({np.nan:df['Gender'].mode(),

np.nan:df['Married'].mode(),

np.nan:df['Dependents'].mode(),

np.nan:df['Self_Employed'].mode(),

np.nan:df['Credit_History'].mode(),

np.nan:df['LoanAmount'].mean(),

np.nan:df['Loan_Amount_Term'].mean()})
rfc_vs_dt-2.py hosted with ❤ by GitHub```

```X=df.drop(columns=['Loan_ID','Loan_Status']).values

Y=df['Loan_Status'].values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 42)
rfc_vs_dt-3.py hosted with ❤ by GitHub```

```print('Shape of X_train=>',X_train.shape)

print('Shape of X_test=>',X_test.shape)

print('Shape of Y_train=>',Y_train.shape)

print('Shape of Y_test=>',Y_test.shape)
rfc_vs_dt-4.py hosted with ❤ by GitHub```

```# Building Decision Tree

from sklearn.tree import DecisionTreeClassifier

dt = DecisionTreeClassifier(criterion = 'entropy', random_state = 42)

dt.fit(X_train, Y_train)

dt_pred_train = dt.predict(X_train)
rfc_vs_dt-5.py hosted with ❤ by GitHub```

https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/?utm_source=blog&utm_medium=decision-tree-vs-random-forest-algorithm

```# Evaluation on Training set

dt_pred_train = dt.predict(X_train)

print('Training Set Evaluation F1-Score=>',f1_score(Y_train,dt_pred_train))
rfc_vs_dt-6.py hosted with ❤ by GitHub
# Evaluating on Test set

dt_pred_test = dt.predict(X_test)

print('Testing Set Evaluation F1-Score=>',f1_score(Y_test,dt_pred_test))
rfc_vs_dt-7.py hosted with ❤ by GitHub```

``` # Building  Random Forest Classifier

from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(criterion = 'entropy', random_state = 42)

rfc.fit(X_train, Y_train)
# Evaluating on Training set

rfc_pred_train = rfc.predict(X_train)

print('Training Set Evaluation F1-Score=>',f1_score(Y_train,rfc_pred_train))
rfc_vs_dt-8.py hosted with ❤ by GitHub
f1 score random forest
# Evaluating on Test set

rfc_pred_test = rfc.predict(X_test)

print('Testing Set Evaluation F1-Score=>',f1_score(Y_test,rfc_pred_test))
rfc_vs_dt-9.py hosted with ❤ by GitHub```

```feature_importance=pd.DataFrame({

'rfc':rfc.feature_importances_,

'dt':dt.feature_importances_

},index=df.drop(columns=['Loan_ID','Loan_Status']).columns)

feature_importance.sort_values(by='rfc',ascending=True,inplace=True)
index = np.arange(len(feature_importance))

fig, ax = plt.subplots(figsize=(18,8))

rfc_feature=ax.barh(index,feature_importance['rfc'],0.4,color='purple',label='Random Forest')

dt_feature=ax.barh(index+0.4,feature_importance['dt'],0.4,color='lightgreen',label='Decision Tree')

ax.set(yticks=index+0.4,yticklabels=feature_importance.index)
ax.legend()

plt.show()
rfc_vs_dt-10.py hosted with ❤ by GitHub```

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html#sklearn.ensemble.BaggingClassifier

“随机森林适用于拥有大型数据集的情况，并且可解释性不是主要考虑因素。”

https://www.analyticsvidhya.com/blog/2019/08/decoding-black-box-step-by-step-guide-interpretable-machine-learning-models-python/?utm_source=blog&utm_medium=decision-tree-vs-random-forest-algorithm

Decision Tree vs. Random Forest – Which Algorithm Should you Use?

https://www.analyticsvidhya.com/blog/2020/05/decision-tree-vs-random-forest-algorithm/

0 条评论

• ### 独家 | 数据科学入门指南：新手如何步入数据科学领域？

数据科学，据说是本世纪最吸引人的工作，已经成为我们许多人梦寐以求的工作。但对某些人来说，数据科学看起来像一个充满挑战的迷宫，让人无从下手。如果你也是其中之一，那...

• ### 独家 | 一文读懂随机森林的解释和实现（附python代码）

本文从单棵决策树讲起，然后逐步解释了随机森林的工作原理，并使用sklearn中的随机森林对某个真实数据集进行预测。

• ### 全解今日头条大数据算法原理（附PPT&视频）

来源：今日头条 通过本文为大家从4个方面介绍今日头条推荐系统的算法原理。 3分钟了解今日头条推荐算法原理 今天，算法分发已经是信息平台、搜索引擎、浏览器、社交...

• ### Matlab fvtool滤波器频响的幅度显示归一化

我们在用matlab设计滤波器后，可以用fvtool来看滤波器的频响，比如我们有了滤波器的系数后，直接用fvtool。

• ### 网址缩短，网址压缩源码

主要功能有：完整的后台管理、功能齐全的用户面板、用户系统、社交分享、短网址统计、短网址自定义、多国语言支持、社交分享以及API系统等。

• ### 11道面试中不常见却一定会问到Python题解析

我们知道网上有非常多面试题的解析，但是其中往往是前几年的老题了。 为了帮助小伙伴们能够在Python工作面试中脱颖而出，再此特别奉上2019年11道最新Pyth...

• ### 追踪接触者以控制COVID-19大流行（CS SI）

控制 COVID-19大流行需要大量减少接触，主要是通过实施行动控制达到强制隔离的水平。 这导致了经济的大部分崩溃。这种疾病的携带者大约在接触病毒后3天具有传染...

• ### 【Codeforces】1213A - Chips Moving

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。 ...

• ### What do we mean by “understanding” something?

In this chapter, we shall examine the most fundamental ideas that we have about ...

• ### R语言教程之-线性回归

回归分析是一种非常广泛使用的统计工具，用于建立两个变量之间的关系模型。 这些变量之一称为预测变量，其值通过实验收集。 另一个变量称为响应变量，其值从预测变量派生...