关于科学知识中的Logistic回归模型,我有两个问题:
发布于 2020-03-04 23:07:09
发布于 2022-10-13 10:05:25
哪一项统计可以显示模型的预测能力?
这完全取决于你想要解决的具体问题。例如,要么你有不平衡的类别,或者如果做错误的阴性错误将花费你比假阳性(没有确定病人的恶性肿瘤)。因此,'recallaka sensitivity or true positive rate, and etc could either show your model's true power in predicting the classes. Furthermore, inscikit-learnyou can explicitly specify which scoring you're looking for. As an example foraccuracy`:,accuracy,precision是阳性预测值。
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
# Generate features matrix and target vector
X, y = make_classification(n_samples = 10000,
n_features = 3,
n_informative = 3,
n_redundant = 0,
n_classes = 2,
random_state = 1)
# Create logistic regression
logit = LogisticRegression()
# Cross-validate model using accuracy
cross_val_score(logit, X, y, scoring="accuracy")
array([ 0.95170966, 0.9580084 , 0.95558223])更多 on scoring参数,来自scikit-学习文档。
哪个统计数据可以显示我的模型是更好地预测事件1还是事件0?
接收操作特性(ROC)曲线是评价二值分类器质量的常用方法。在每个概率阈值上,ROC比较了真阳性和假阳性的存在。通过绘制ROC曲线,我们可以看到模型的性能。在scikit-learn中,我们可以使用roc_curve计算每个阈值的真假阳性值,然后绘制它们。
此外,还可以使用predict_proba在logistic回归中查看软分类。您可以在拟合模型之后访问它们,它们表示每个观测的预测概率。以下是上面的例子:
features_train, features_test, target_train, target_test = train_test_split(
X, y, test_size=0.1, random_state=1)
# Create another classifier
logit = LogisticRegression()
logit.fit(features_train, target_train)
# Get predicted probabilities
target_probabilities = logit.predict_proba(features_test)[:,1]
# Create true and false positive rates
false_positive_rate, true_positive_rate, threshold = roc_curve(target_test, target_probabilities)
# Plot ROC curve
plt.title("Receiver Operating Characteristic")
plt.plot(false_positive_rate, true_positive_rate)
plt.plot([0, 1], ls="--")
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()https://datascience.stackexchange.com/questions/69143
复制相似问题