# 2 python中的分群质量

## 1.1 Adjusted Rand index 调整兰德系数

```>>> from sklearn import metrics
>>> labels_true = [0, 0, 0, 1, 1, 1]
>>> labels_pred = [0, 0, 1, 1, 2, 2]

0.24```

.

## 1.2 Mutual Information based scores 互信息

Two different normalized versions of this measure are available, Normalized Mutual Information(NMI) and Adjusted Mutual Information(AMI). NMI is often used in the literature while AMI was proposed more recently and is normalized against chance:

```>>> from sklearn import metrics
>>> labels_true = [0, 0, 0, 1, 1, 1]
>>> labels_pred = [0, 0, 1, 1, 2, 2]

0.22504```

.

## 1.3 Homogeneity, completeness and V-measure

```>>> from sklearn import metrics
>>> labels_true = [0, 0, 0, 1, 1, 1]
>>> labels_pred = [0, 0, 1, 1, 2, 2]

>>> metrics.homogeneity_score(labels_true, labels_pred)
0.66...

>>> metrics.completeness_score(labels_true, labels_pred)
0.42...```

```>>> metrics.v_measure_score(labels_true, labels_pred)
0.51...```

.

## 1.4 Fowlkes-Mallows scores

The Fowlkes-Mallows score FMI is defined as the geometric mean of the pairwise precision and recall:

```>>> from sklearn import metrics
>>> labels_true = [0, 0, 0, 1, 1, 1]
>>> labels_pred = [0, 0, 1, 1, 2, 2]
>>>
>>> metrics.fowlkes_mallows_score(labels_true, labels_pred)
0.47140...```

.

## 1.5 Silhouette Coefficient 轮廓系数

```>>> import numpy as np
>>> from sklearn.cluster import KMeans
>>> kmeans_model = KMeans(n_clusters=3, random_state=1).fit(X)
>>> labels = kmeans_model.labels_
>>> metrics.silhouette_score(X, labels, metric='euclidean')
...
0.55...```

.

## 1.6 Calinski-Harabaz Index

也就是说，类别内部数据的协方差越小越好，类别之间的协方差越大越好，这样的Calinski-Harabasz分数会高。 　在scikit-learn中， Calinski-Harabasz Index对应的方法是metrics.calinski_harabaz_score. 在真实的分群label不知道的情况下，可以作为评估模型的一个指标。 同时，数值越小可以理解为：组间协方差很小，组与组之间界限不明显。 与轮廓系数的对比，笔者觉得最大的优势：快！相差几百倍！毫秒级

```>>> import numpy as np
>>> from sklearn.cluster import KMeans
>>> kmeans_model = KMeans(n_clusters=3, random_state=1).fit(X)
>>> labels = kmeans_model.labels_
>>> metrics.calinski_harabaz_score(X, labels)
560.39...```

0 条评论

## 相关文章

### 关于《半反去雾算法》一文的四宗罪。

最近在看一篇关于去雾的算法的文章：A Fast Semi-Inverse Approach to Detect and Remove the Haze fr...

1828

3087

34412

2807

### Scikit-Learn: 机器学习的灵丹妙药

Scikit-Learn是python的核心机器学习包，它拥有支持基本机器学习项目所需的大部分模块。该库为从业者提供了一个统一的API(ApplicationP...

1011

3191

1012

3235

### 【中秋赏阅】美丽的神经网络：13种细胞构筑的深度学习世界

【新智元导读】人是视觉动物，因此要了解神经网络，没有什么比用图将它们的形象画出来更加简单易懂了。本文囊括 26 种架构，虽然不都是神经网络，但却覆盖了几乎所有常...

3286

1.2K17