我正在尝试编写一个代码来查找Excel中矩阵的Pearson和Spearman相关性,而不是根据我的阈值显示它是否相关。我的代码是这样的:
import pandas
X = pandas.read_excel(open('excel.xlsx', 'rb'))
Y = np.corrcoef(X)
for i in Y :
if i >= 0.50:
print("POSITIVE CORRELATION")
elif CC0 <= -0.50:
print("NEGATIVE CORRELATION")
else :
print("NO CORRELATION")
我的excel矩阵文件是这样的:
X1 X2 X3 X4 X5
A 12 12 16 16 19
B 23 23 23 24 24
C 16 16 20 23 48
D 23 25 22 25 13
E 56 51 51 54 69
这将创建一个Pearson系数数字的矩阵,如下所示;
[[ 1. 0.76072577 0.86385074 -0.75301812 0.66995002]
[ 0.76072577 1. 0.74206343 -0.47660312 0.64827257]
[ 0.86385074 0.74206343 1. -0.93521563 0.93280718]
[-0.75301812 -0.47660312 -0.93521563 1. -0.92556091]
[ 0.66995002 0.64827257 0.93280718 -0.92556091 1. ]]
当我使用阈值时,结果仅显示是否存在正相关或负相关,而不是名称。但我希望它看起来像这样:
POSITIVE CORRELATION BETWEEN A AND B
POSITIVE CORRELATION BETWEEN A AND C... etc.
有没有办法让我将我的数据名称写入输出文件?
发布于 2018-09-14 15:27:45
我能想到的一种方法是:
import pandas
import numpy as np
X = pandas.read_excel(open('excel.xlsx', 'rb'))
Y = np.corrcoef(X)
index_list = X.index.tolist()
for i, index_name in enumerate(index_list):
for j in range(i+1, len(index_list)):
if Y[i][j] >= 0.50:
print("POSITIVE CORRELATION BETWEEN {} & {}".format(index_name, index_list[j]))
elif Y[i][j] <= -0.50:
print("NEGATIVE CORRELATION BETWEEN {} & {}".format(index_name, index_list[j]))
else :
print("NO CORRELATION BETWEEN {} & {}".format(index_name, index_list[j]))
https://stackoverflow.com/questions/-100002634
复制相似问题