我正在Scikit-Learn中做一个多类文本分类。数据集正在使用具有数百个标签的多项式朴素贝叶斯分类器进行训练。下面是Scikit学习脚本的摘录,用于拟合MNB模型
from __future__ import print_function
# Read **`file.csv`** into a pandas DataFrame
import pandas as pd
path = 'data/file.csv'
merged = pd.read_csv(path, error_bad_lines=False, low_memory=False)
# define X and y using the original DataFrame
X = merged.text
y = merged.grid
# split X and y into training and testing sets;
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
# import and instantiate CountVectorizer
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()
# create document-term matrices using CountVectorizer
X_train_dtm = vect.fit_transform(X_train)
X_test_dtm = vect.transform(X_test)
# import and instantiate MultinomialNB
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
# fit a Multinomial Naive Bayes model
nb.fit(X_train_dtm, y_train)
# make class predictions
y_pred_class = nb.predict(X_test_dtm)
# generate classification report
from sklearn import metrics
print(metrics.classification_report(y_test, y_pred_class))
命令行屏幕上metrics.classification_report的简化输出如下所示:
precision recall f1-score support
12 0.84 0.48 0.61 2843
13 0.00 0.00 0.00 69
15 1.00 0.19 0.32 232
16 0.75 0.02 0.05 965
33 1.00 0.04 0.07 155
4 0.59 0.34 0.43 5600
41 0.63 0.49 0.55 6218
42 0.00 0.00 0.00 102
49 0.00 0.00 0.00 11
5 0.90 0.06 0.12 2010
50 0.00 0.00 0.00 5
51 0.96 0.07 0.13 1267
58 1.00 0.01 0.02 180
59 0.37 0.80 0.51 8127
7 0.91 0.05 0.10 579
8 0.50 0.56 0.53 7555
avg/total 0.59 0.48 0.45 35919
我想知道是否有任何方法可以将报告输出到具有常规列标题的标准csv文件中
当我将命令行输出发送到csv文件或尝试将屏幕输出复制/粘贴到电子表格- Openoffice Calc或Excel时,它将结果集中在一列中。看起来像这样:
发布于 2018-12-14 21:20:31
从scikit-learn
v0.20开始,将分类报告转换为pandas
数据帧的最简单方法就是将报告作为dict
返回
report = classification_report(y_test, y_pred, output_dict=True)
然后构造一个数据帧并将其转置:
df = pandas.DataFrame(report).transpose()
从现在开始,您可以自由地使用标准的pandas
方法来生成所需的输出格式(CSV、HTML、LaTeX等)。
请参阅documentation。
发布于 2016-12-09 00:33:14
如果你想要个人的分数,这应该是很好的工作。
import pandas as pd
def classification_report_csv(report):
report_data = []
lines = report.split('\n')
for line in lines[2:-3]:
row = {}
row_data = line.split(' ')
row['class'] = row_data[0]
row['precision'] = float(row_data[1])
row['recall'] = float(row_data[2])
row['f1_score'] = float(row_data[3])
row['support'] = float(row_data[4])
report_data.append(row)
dataframe = pd.DataFrame.from_dict(report_data)
dataframe.to_csv('classification_report.csv', index = False)
report = classification_report(y_true, y_pred)
classification_report_csv(report)
发布于 2017-02-26 18:02:12
我们可以从precision_recall_fscore_support
函数中获得实际值,然后将它们放入数据帧中。下面的代码将给出相同的结果,但现在是在pandas dataframe中:
clf_rep = metrics.precision_recall_fscore_support(true, pred)
out_dict = {
"precision" :clf_rep[0].round(2)
,"recall" : clf_rep[1].round(2)
,"f1-score" : clf_rep[2].round(2)
,"support" : clf_rep[3]
}
out_df = pd.DataFrame(out_dict, index = nb.classes_)
avg_tot = (out_df.apply(lambda x: round(x.mean(), 2) if x.name!="support" else round(x.sum(), 2)).to_frame().T)
avg_tot.index = ["avg/total"]
out_df = out_df.append(avg_tot)
print out_df
https://stackoverflow.com/questions/39662398
复制相似问题