“题外话:相关性不是因果,相关性只能说数据上来讲两个或多个因素具有正/负/无相关性,其间没有谁决定谁的关系”
相关系数(correlation coefficient)用于描述两个变量之间的相关程度。一般在[-1, 1]之间。有,pearson相关系数:适用于连续性变量,且变量服从正态分布的情况,为参数性的相关系数。spearman等相关系数:适用于连续性及分类型变量,为非参数性的相关系数。
cor.test()和cor()是R包中自带的计算相关系数的函数,两者差别仅为cor()只给出相关系数一个值,cor.test()给出相关系数,n(个数)、p值等。
01
—
向量与向量相关
> x <- c(1:10)> set.seed(1234)> y <- rnorm(10,0,1)> # "pearson" (默认), "kendall", or "spearman":> cor(x,y)[1] -0.1069777> cor(x,y,method = "pearson")[1] -0.1069777> plot(x,y) #见图1
> cor.test(x,y)
Pearson's product-moment correlation
data: x and yt = -0.30432, df = 8, p-value = 0.7686alternative hypothesis: true correlation is not equal to 095 percent confidence interval: -0.6901203 0.5603945sample estimates: cor -0.1069777
上图1
02
—
多变量与多变量相关(cor)
> dt <- iris[,-5]> head(dt) Sepal.Length Sepal.Width Petal.Length Petal.Width1 5.1 3.5 1.4 0.22 4.9 3.0 1.4 0.23 4.7 3.2 1.3 0.24 4.6 3.1 1.5 0.25 5.0 3.6 1.4 0.26 5.4 3.9 1.7 0.4> cor(dt) Sepal.Length Sepal.Width Petal.Length Petal.WidthSepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000> cor(dt,dt) Sepal.Length Sepal.Width Petal.Length Petal.WidthSepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
# 先关系数可视化> library(corrplot)> corrplot(cor(dt),method = "number") # 显示数字 见图2
可以发现,当计算同一数据自身各变量的相关性时,cor(dt)等同于cor(dt,dt)
上图2
# install.packages("PerformanceAnalytics")library(PerformanceAnalytics)chart.Correlation(dt,histogram = T,pch=19) # 见图三
上图3
03
—
多变量与多变量相关Hmisc::rcorr
> dt <- iris[,-5]> library(Hmisc)> res<-rcorr(as.matrix(dt)) #此处需将原始数据转换为矩阵matrix类型> res Sepal.Length Sepal.Width Petal.Length Petal.WidthSepal.Length 1.00 -0.12 0.87 0.82Sepal.Width -0.12 1.00 -0.43 -0.37Petal.Length 0.87 -0.43 1.00 0.96Petal.Width 0.82 -0.37 0.96 1.00
n= 150
P Sepal.Length Sepal.Width Petal.Length Petal.WidthSepal.Length 0.1519 0.0000 0.0000 Sepal.Width 0.1519 0.0000 0.0000 Petal.Length 0.0000 0.0000 0.0000 Petal.Width 0.0000 0.0000 0.0000
以上代码是将rcorr输出结果进行整理的自定义函数,参考来源《Correlation matrix : A quick start guide to analyze, format and visualize a correlation matrix using R software》
http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software#at_pco=smlwn-1.0&at_si=5e8f19ae4cd478e7&at_ab=per-2&at_pos=0&at_tot=1
# ++++++++++++++++++++++++++++# flattenCorrMatrix# ++++++++++++++++++++++++++++# cormat : matrix of the correlation coefficients# pmat : matrix of the correlation p-valuesflattenCorrMatrix <- function(cormat, pmat) { ut <- upper.tri(cormat) data.frame( row = rownames(cormat)[row(cormat)[ut]], column = rownames(cormat)[col(cormat)[ut]], cor =(cormat)[ut], p = pmat[ut] )}
示例
> flattenCorrMatrix <- function(cormat, pmat) {+ ut <- upper.tri(cormat)+ data.frame(+ row = rownames(cormat)[row(cormat)[ut]],+ column = rownames(cormat)[col(cormat)[ut]],+ cor =(cormat)[ut],+ p = pmat[ut]+ )+ }> > library(Hmisc)> dt <- iris[,-5]> res2<-rcorr(as.matrix(dt))> flattenCorrMatrix(res2$r, res2$P) row column cor p1 Sepal.Length Sepal.Width -0.1175698 1.518983e-012 Sepal.Length Petal.Length 0.8717538 0.000000e+003 Sepal.Width Petal.Length -0.4284401 4.513314e-084 Sepal.Length Petal.Width 0.8179411 0.000000e+005 Sepal.Width Petal.Width -0.3661259 4.073229e-066 Petal.Length Petal.Width 0.9628654 0.000000e+00
04
—
多变量与多变量相关psych::corr.test
比较6个datafrme中前一半个与一半样本的关联,需要使用psych包的corr.test()。需要注意输入的两个dataframe的row必须长度和顺序都一致。
> dt <- iris[,-5]> nrow(dt)[1] 150> dt_up <- dt[c(1:75),]> dt_down <- dt[c(76:150),]> library(psych)> psych_cor <- corr.test(dt_up, dt_down, method = "pearson")> psych_corCall:corr.test(x = dt_up, y = dt_down, method = "pearson")Correlation matrix Sepal.Length Sepal.Width Petal.Length Petal.WidthSepal.Length 0.06 0.14 0.10 0.14Sepal.Width -0.29 -0.30 -0.36 -0.43Petal.Length 0.25 0.26 0.28 0.34Petal.Width 0.24 0.23 0.27 0.34Sample Size [1] 75Probability values adjusted for multiple tests. Sepal.Length Sepal.Width Petal.Length Petal.WidthSepal.Length 0.92 0.92 0.92 0.92Sepal.Width 0.12 0.12 0.02 0.00Petal.Length 0.24 0.22 0.14 0.03Petal.Width 0.24 0.25 0.17 0.04
To see confidence intervals of the correlations, print with the short=FALSE option
> heatmap(x = psych_cor$r) # 绘制热力图