文章/答案/技术大牛

发布

社区首页 >问答首页 >学习决策树模型评价

问学习决策树模型评价
EN

Stack Overflow用户

提问于 2016-08-23 06:55:38

回答 2查看 6.9K关注 0票数 4

下面是相关的代码和文档，对于默认的cross_val_score而没有显式指定score，输出数组意味着精度、AUC或其他指标？

使用Python2.7和miniconda解释器。

http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

>>> from sklearn.datasets import load_iris
>>> from sklearn.cross_validation import cross_val_score
>>> from sklearn.tree import DecisionTreeClassifier
>>> clf = DecisionTreeClassifier(random_state=0)
>>> iris = load_iris()
>>> cross_val_score(clf, iris.data, iris.target, cv=10)
...                             
...
array([ 1.     ,  0.93...,  0.86...,  0.93...,  0.93...,
        0.93...,  0.93...,  1.     ,  0.93...,  1.      ])

你好，林

scikit-learn

decision-tree

python

python-2.7

machine-learning

回答 2

Stack Overflow用户

回答已采纳

发布于 2016-08-23 07:05:46

来自用户指南

默认情况下，在每次CV迭代时计算的分数是估计器的得分方法。可以通过使用记分参数来更改此值：

来自DecisionTreeClassifier 文档

返回给定测试数据和标签的平均精度。在多标签分类中，这是子集的准确性，这是一个严酷的度量，因为您需要为每个样本正确地预测每个标签集。

不要被“平均精度”所迷惑，这只是计算准确性的常规方法。遵循到来源的链接

    from .metrics import accuracy_score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)

现在是来源 for metrics.accuracy_score

def accuracy_score(y_true, y_pred, normalize=True, sample_weight=None):
    ...
    # Compute accuracy for each possible representation
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    if y_type.startswith('multilabel'):
        differing_labels = count_nonzero(y_true - y_pred, axis=1)
        score = differing_labels == 0
    else:
        score = y_true == y_pred

    return _weighted_sum(score, sample_weight, normalize)

如果你仍然不相信：

def _weighted_sum(sample_score, sample_weight, normalize=False):
    if normalize:
        return np.average(sample_score, weights=sample_weight)
    elif sample_weight is not None:
        return np.dot(sample_score, sample_weight)
    else:
        return sample_score.sum()

注意:对于accuracy_score正常化参数，默认为True，因此它只是返回布尔numpy数组的np.average，因此它只是正确预测的平均数量。

票数 3

Stack Overflow用户

发布于 2016-08-23 07:05:28

如果没有给出评分参数，cross_val_score将默认使用您使用的估计器的.score方法。对于DecisionTreeClassifier，它是平均精度(如下面的docstring所示)：

In [11]: DecisionTreeClassifier.score?
Signature: DecisionTreeClassifier.score(self, X, y, sample_weight=None)
Docstring:
Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy
which is a harsh metric since you require for each sample that
each label set be correctly predicted.

Parameters
----------
X : array-like, shape = (n_samples, n_features)
    Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)
    True labels for X.

sample_weight : array-like, shape = [n_samples], optional
    Sample weights.

Returns
-------
score : float
    Mean accuracy of self.predict(X) wrt. y.

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/39094267

复制

相似问题

问学习决策树模型评价
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问学习决策树模型评价EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问学习决策树模型评价
EN