# 机器学习之熵从定义到代码

# -*- coding: utf-8 -*-# @Time : 2018/4/19 17:04# @Author : mjautoman# @Site : # @File : tree.py# @Software: PyCharmfrom math import logdef calcEntropy(dataSet): numEntries = len (dataSet) labelDic = {} for vec in dataSet: currentLabel = vec[-1] # 最后一列 if not labelDic.has_key(currentLabel): labelDic[currentLabel] = 1 else: labelDic[currentLabel] += 1 entropy = 0.0 for key in labelDic: pi = float(labelDic[key]) / numEntries entropy -= pi * log(pi,2) return entropy

#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/19 16:53# @Author : mjautoman# @Site : # @File : entropy.py# @Software: PyCharmfrom tree import calcEntropyimport numpy as npimport matplotlib.pyplot as plt''' 概率x服从正态分布时，对x * log2(x)在(0,1)区间积分，得到香农熵曲线图x = np.arange(0, 1, 0.01)y = -x * np.log2(x) - (1 - x) * np.log2(1 - x)plt.plot(x, y)plt.show()'''dataSet = [[1,1,'1'], [1,0,'3'], [0,1,'1'], [0,1,'1'], [1,1,'2'], [1,0,'1'], [0,1,'1'], [0,1,'1']]entropy = calcEntropy(dataSet)print(entropy)

• 发表于:
• 原文链接http://kuaibao.qq.com/s/20180421G1I4M700?refer=cp_1026
• 腾讯「云+社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
• 如有侵权，请联系 yunjia_community@tencent.com 删除。

2018-06-06

2018-04-10

2021-04-17

2021-04-17

2021-04-17

2021-04-17