问在Python中计算皮尔逊相关性和重要性
EN

Stack Overflow用户

提问于 2010-10-16 22:15:28

回答 13查看 413.8K关注 0票数 211

我正在寻找一个函数，它接受两个列表作为输入，并返回Pearson correlation和相关性的重要性。

statistics

scipy

python

numpy

回答 13

Stack Overflow用户

发布于 2010-10-16 22:29:57

你可以看看scipy.stats

from pydoc import help
from scipy.stats.stats import pearsonr
help(pearsonr)

>>>
Help on function pearsonr in module scipy.stats.stats:

pearsonr(x, y)
 Calculates a Pearson correlation coefficient and the p-value for testing
 non-correlation.

 The Pearson correlation coefficient measures the linear relationship
 between two datasets. Strictly speaking, Pearson's correlation requires
 that each dataset be normally distributed. Like other correlation
 coefficients, this one varies between -1 and +1 with 0 implying no
 correlation. Correlations of -1 or +1 imply an exact linear
 relationship. Positive correlations imply that as x increases, so does
 y. Negative correlations imply that as x increases, y decreases.

 The p-value roughly indicates the probability of an uncorrelated system
 producing datasets that have a Pearson correlation at least as extreme
 as the one computed from these datasets. The p-values are not entirely
 reliable but are probably reasonable for datasets larger than 500 or so.

 Parameters
 ----------
 x : 1D array
 y : 1D array the same length as x

 Returns
 -------
 (Pearson's correlation coefficient,
  2-tailed p-value)

 References
 ----------
 http://www.statsoft.com/textbook/glosp.html#Pearson%20Correlation

票数 206

Stack Overflow用户

发布于 2013-04-16 08:17:58

皮尔逊相关性可以用numpy的corrcoef来计算。

import numpy
numpy.corrcoef(list1, list2)[0, 1]

票数 119

Stack Overflow用户

发布于 2011-04-19 16:52:33

如果您不想安装scipy，我使用了这个快速技巧，对Programming Collective Intelligence稍作修改

def pearsonr(x, y):
  # Assume len(x) == len(y)
  n = len(x)
  sum_x = float(sum(x))
  sum_y = float(sum(y))
  sum_x_sq = sum(xi*xi for xi in x)
  sum_y_sq = sum(yi*yi for yi in y)
  psum = sum(xi*yi for xi, yi in zip(x, y))
  num = psum - (sum_x * sum_y/n)
  den = pow((sum_x_sq - pow(sum_x, 2) / n) * (sum_y_sq - pow(sum_y, 2) / n), 0.5)
  if den == 0: return 0
  return num / den

票数 38

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/3949226

复制

相似问题

问在Python中计算皮尔逊相关性和重要性
EN

回答 13

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python中计算皮尔逊相关性和重要性EN

回答 13

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python中计算皮尔逊相关性和重要性
EN