我是随机森林(以及蟒蛇)的新手。我使用的是随机森林分类器,数据集定义为't2002‘。
t2002.column
下面是几个列:
Index(['IndividualID', 'ES2000_B01ID', 'NSSec_B03ID', 'Vehicle',
'Age_B01ID',
'IndIncome2002_B02ID', 'MarStat_B01ID', 'EcoStat_B03ID',
'MainMode_B03ID', 'TripStart_B02ID', 'TripEnd_B02ID',
'TripDisIncSW_B01ID', 'TripTotalTime_B01ID', 'TripTravTime_B01ID',
'TripPurpFrom_B01ID', 'TripPurpTo_B01ID'],
dtype='object')
我使用如下代码来运行分类器:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import make_scorer, accuracy_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
X_all = t2002.drop(['MainMode_B03ID'],axis=1)
y_all = t2002['MainMode_B03ID']
p = 0.2
X_train,X_test, y_train, y_test = train_test_split(X_all,y_all,test_size=p,
random_state=23)
clf = RandomForestClassifier()
acc_scorer = make_scorer(accuracy_score)
parameters = {
} # parameter is blank
grid_obj = GridSearchCV(clf,parameters,scoring=acc_scorer)
grid_obj = grid_obj.fit(X_train,y_train)
clf = grid_obj.best_estimator_
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print(accuracy_score(y_test,predictions))
在这种情况下,我如何使用pandas生成交叉表(如表格)来显示详细的预测结果?
提前感谢!
发布于 2018-08-23 05:37:25
您可以先使用sklearn创建混淆矩阵,然后将其转换为pandas数据帧。
from sklearn.metrics import confusion_matrix
#creating confusion matrix as array
confusion = confusion_matrix(t2002['MainMode_B03ID'].tolist(),predictions)
#converting to df
new_df = pd.DataFrame(confusion,
index = t2002['MainMode_B03ID'].unique(),
columns = t2002['MainMode_B03ID'].unique())
发布于 2018-08-30 00:58:08
使用pandas很容易显示所有预测结果。按照docs中的说明使用cv_results_
。
import pandas as pd
results = pd.DataFrame(clf.cv_results_) # clf is the GridSearchCV object
print(results.head())
https://stackoverflow.com/questions/51977643
复制