我正在尝试计算以下前3个国家(德国,意大利,日本)与后3个国家(美国,加拿大,英国)之间的一些相关性。
例如,德国vs美国、德国vs加拿大、德国vs英国、意大利vs美国、意大利vs加拿大、意大利vs英国等等。
但是,由于我的数据在前几行确实有一些间隙(比如在德国和日本),通常的corr函数将不起作用。因此,在这种情况下,数据应该跳过德国的前2行和日本的前3行,并从德国和日本的第3行和第4行开始,以便与美国/加拿大/英国的相似行进行比较,同时具有意大利的完整数据的相关性。
所以我想知道我该怎么做?
谢谢
df11 <-
tibble(
date = 2001:2010,
Germany = runif(10),
Italy = runif(10),
Japan = runif(10),
US = runif(10),
Canada = runif(10),
UK = runif(10)
)
df11$Germany[1:2] <- NA
df11$Japan[1:3] <- NA发布于 2021-10-15 07:29:44
可以在r-bloggers中找到的mycor函数,
mycor<- function(x,...){
r<- apply(x, 2, function(j){
apply(x, 2, function(i){
as.numeric(cor.test(i,j)$estimate)
})
})
P<- apply(x, 2, function(j){
apply(x, 2, function(i){
as.numeric(cor.test(i,j)$p.value)
})
})
out<-c()
out$P<- P
out$r<- r
return(out)
}
mycor(df11)$r
date Germany Italy Japan US Canada UK
date 1.00000000 0.3829687 -0.09309048 -0.46562050 -0.44324591 0.41293491 0.7908250
Germany 0.38296868 1.0000000 0.32186956 -0.19135611 -0.49111087 -0.38151625 0.6377928
Italy -0.09309048 0.3218696 1.00000000 0.04341171 0.09589073 -0.32724552 0.2138135
Japan -0.46562050 -0.1913561 0.04341171 1.00000000 0.52800797 0.25226383 -0.4802936
US -0.44324591 -0.4911109 0.09589073 0.52800797 1.00000000 0.08124373 -0.4869257
Canada 0.41293491 -0.3815163 -0.32724552 0.25226383 0.08124373 1.00000000 0.1033160
UK 0.79082499 0.6377928 0.21381349 -0.48029364 -0.48692567 0.10331600 1.0000000发布于 2021-10-15 07:25:42
在combn中使用cor。您可以显式定义例如df11[c("Germany", "Italy", "Japan")]来选择特定的国家,而不是只排除date列的df[-1]。use="complete.obs"仅包含完整的观察结果,有关其他选项,请参阅文档?cor。
res <- combn(df11[-1], 2, cor, use="complete.obs", simplify=FALSE)结果
res
# [[1]]
# Germany Italy
# Germany 1.00000000 -0.08586634
# Italy -0.08586634 1.00000000
#
# [[2]]
# Germany Japan
# Germany 1.0000000 0.3699611
# Japan 0.3699611 1.0000000
#
# [[3]]
# Germany US
# Germany 1.00000000 0.07002937
# US 0.07002937 1.00000000
#
# [[4]]
# Germany Canada
# Germany 1.0000000 0.3949677
# Canada 0.3949677 1.0000000
#
# [[5]]
# Germany UK
# Germany 1.0000000 0.5248062
# UK 0.5248062 1.0000000
#
# [[6]]
# Italy Japan
# Italy 1.00000000 0.09700777
# Japan 0.09700777 1.00000000
#
# [[7]]
# Italy US
# Italy 1.0000000 -0.1351394
# US -0.1351394 1.0000000
#
# [[8]]
# Italy Canada
# Italy 1.0000000 -0.1587657
# Canada -0.1587657 1.0000000
#
# [[9]]
# Italy UK
# Italy 1.0000000 -0.5379418
# UK -0.5379418 1.0000000
#
# [[10]]
# Japan US
# Japan 1.000000 -0.744641
# US -0.744641 1.000000
#
# [[11]]
# Japan Canada
# Japan 1.0000000 0.5813378
# Canada 0.5813378 1.0000000
#
# [[12]]
# Japan UK
# Japan 1.0000000 -0.1877573
# UK -0.1877573 1.0000000
#
# [[13]]
# US Canada
# US 1.0000000 -0.5947739
# Canada -0.5947739 1.0000000
#
# [[14]]
# US UK
# US 1.00000000 0.01007044
# UK 0.01007044 1.00000000
#
# [[15]]
# Canada UK
# Canada 1.0000000 0.3784406
# UK 0.3784406 1.0000000如果你对系数感兴趣而不是矩阵,那就去做吧。
sapply(res, `[`, 2)
# [1] -0.08586634 0.36996112 0.07002937 0.39496772 0.52480618 0.09700777
# [7] -0.13513936 -0.15876568 -0.53794180 -0.74464099 0.58133779 -0.18775725
# [13] -0.59477393 0.01007044 0.37844058或者如果你想要矩阵,
cor(df11[-1], use="complete.obs")
# Germany Italy Japan US Canada UK
# Germany 1.00000000 0.05086447 0.36996112 -0.02672511 0.3531181 0.4261229
# Italy 0.05086447 1.00000000 0.09700777 -0.03657892 -0.1514971 -0.6038602
# Japan 0.36996112 0.09700777 1.00000000 -0.74464099 0.5813378 -0.1877573
# US -0.02672511 -0.03657892 -0.74464099 1.00000000 -0.6574211 0.2106356
# Canada 0.35311813 -0.15149706 0.58133779 -0.65742113 1.0000000 0.2452463
# UK 0.42612287 -0.60386019 -0.18775725 0.21063562 0.2452463 1.0000000如果您打算使用cor.test,您可以使用combn中的names,并使用sprintf将其粘贴到as.formula。参见上面的如何只获取一个子集。
res2 <- combn(names(df11[-1]), 2, \(x)
cor.test(as.formula(sprintf('~ %s + %s', x[1], x[2])), data=df11),
simplify=F)结果2
res2[[1]] ## first result
# Pearson's product-moment correlation
#
# data: Germany and Italy
# t = -0.21111, df = 6, p-value = 0.8398
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
# -0.7454347 0.6586606
# sample estimates:
# cor
# -0.08586634 https://stackoverflow.com/questions/69581134
复制相似问题