首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >用R模拟高维多元正态数据

用R模拟高维多元正态数据
EN

Stack Overflow用户
提问于 2022-09-29 16:44:54
回答 3查看 112关注 0票数 0

我试图用n= 100和p= 400在R中模拟高维多元正态数据(两组不同的变量具有一定的相关性)。以下是我的代码:

代码语言:javascript
运行
复制
## load library MASS
library(MASS)

## sample size set to n = 100
sample_size <- 100      

## I try to simulate two different groups of variables for each with 200 variables                  
sample_meanvector <- c(runif(200,0,1), runif(200,6,8))

## covariance matrix, some variables set to be correlated
sample_covariance_matrix <- matrix(NA, nrow = 400, ncol = 400)
diag(sample_covariance_matrix) <- 1
set.seed(666)
sample_covariance_matrix[lower.tri(sample_covariance_matrix)] <- runif(79800, 0.00001, 0.2)
sample_covariance_matrix[lower.tri(sample_covariance_matrix)][sample(1:79800, 10000)] <- runif(10000, 0.6, 0.9)

## make the matrix symmetric
sample_covariance_matrix[upper.tri(sample_covariance_matrix)]<-t(sample_covariance_matrix)[upper.tri(sample_covariance_matrix)]

## create multivariate normal distribution
sample_distribution <- mvrnorm(n = sample_size,
                               mu = sample_meanvector,
                               Sigma = sample_covariance_matrix)

但是,每次运行这个mvrnorm函数时,我都会得到以下错误:

mu范数中的

误差(n= sample_size,mu= sample_meanvector,Sigma = sample_covariance_matrix):'Sigma‘不是正定的

我有两个问题:

  1. 为什么会有这个错误?
  2. 如何编辑我的代码来模拟高维多变量正常数据?

非常感谢!

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2022-09-29 21:20:01

下面是一种可以生成高度相关矩阵的方法:

代码语言:javascript
运行
复制
library(MASS)
library(Matrix)

sample_size <- 100      
sample_meanvector <- c(runif(200,0,1), runif(200,6,8))
sample_covariance_matrix <- matrix(NA, nrow = 400, ncol = 400)
diag(sample_covariance_matrix) <- 1
set.seed(666)

mat_Sim <- matrix(data = NA, nrow = 400, ncol = 400)
U <- runif(n = 400) * 0.5

for(i in 1 : 400)
{
  if(i <= 350)
  {
    U_Star <- pmin(U + 0.25 * runif(n = 400), 0.99999)
    
  }else
  {
    U_Star <- pmin(pmax(U + sample(c(-1, 1), size = 400, replace = TRUE) * runif(n = 400), 0.00001), 0.99999)
  }
  
  mat_Sim[, i] <- qnorm(U_Star)  
}

cor_Mat <- cor(mat_Sim)
sample_covariance_matrix <- cor_Mat * 200

## create multivariate normal distribution
sample_distribution <- mvrnorm(n = sample_size,
                               mu = sample_meanvector,
                               Sigma = sample_covariance_matrix)

image(cor_Mat)

7/8矩阵的相关系数在0.6 ~ 0.8之间,1/8矩阵的相关系数在0~ 0.3之间。

票数 1
EN

Stack Overflow用户

发布于 2022-09-29 20:57:46

您可以考虑以下代码:

代码语言:javascript
运行
复制
library(MASS)
library(Matrix)

sample_size <- 100      
sample_meanvector <- c(runif(200,0,1), runif(200,6,8))
sample_covariance_matrix <- matrix(NA, nrow = 400, ncol = 400)
diag(sample_covariance_matrix) <- 1
set.seed(666)
sample_covariance_matrix[lower.tri(sample_covariance_matrix)] <- runif(79800, 0.00001, 0.2)
sample_covariance_matrix[lower.tri(sample_covariance_matrix)][sample(1:79800, 10000)] <- runif(10000, 0.6, 0.9)
sample_covariance_matrix[upper.tri(sample_covariance_matrix)]<-t(sample_covariance_matrix)[upper.tri(sample_covariance_matrix)]
sample_covariance_matrix_Near_PD <- nearPD(sample_covariance_matrix)$mat


## create multivariate normal distribution
sample_distribution <- mvrnorm(n = sample_size,
                               mu = sample_meanvector,
                               Sigma = sample_covariance_matrix_Near_PD)
票数 0
EN

Stack Overflow用户

发布于 2022-09-30 14:21:51

在我的问题中,我把高相关性和低相关性放在同一个协方差矩阵中。在此基础上,利用“查找正定矩阵”函数破坏了高相关性。然而,当我分别模拟这两个部分时,高相关性不会受到太大的影响。以下是我的最新代码:

代码语言:javascript
运行
复制
# load library MASS
library(MASS)
sample_size <- 100

##### Saperate high and low correlation part ####
### High correlation ###
sample_mean_high <- c(runif(50, -1 , 1))
sample_cov_high <- matrix(0, nrow = 50, ncol = 50)
diag(sample_cov_high) <- 1
sample_cov_high[lower.tri(sample_cov_high)] <- runif(1225, 0.6, 0.9)
sample_cov_high[upper.tri(sample_cov_high)] <- t(sample_cov_high[lower.tri(sample_cov_high)])
sample_cov_high <- sfsmisc::posdefify(sample_cov_high)
# create multivariate normal distribution
sample_dist_high <- mvrnorm(n = sample_size, mu = sample_mean_high, Sigma = sample_cov_high)

### Low correlation ###
sample_mean_low <- c(runif(350, -1 , 1))
sample_cov_low <- matrix(0, nrow = 350, ncol = 350)
diag(sample_cov_low) <- 1
sample_cov_low[lower.tri(sample_cov_low)] <- runif(61075, -0.5, 0.5)
sample_cov_low[upper.tri(sample_cov_low)] <- t(sample_cov_low[lower.tri(sample_cov_low)])
sample_cov_low <- sfsmisc::posdefify(sample_cov_low)
# create multivariate normal distribution
sample_dist_low <- mvrnorm(n = sample_size, mu = sample_mean_low, Sigma = sample_cov_low)

### Check correlation ###
cor_high <- cor(sample_dist_high)
hist(cor_high[lower.tri(cor_high)])
cor_low <- cor(sample_dist_low)
hist(cor_low[lower.tri(cor_low)])
cor_between <- cor(sample_dist_high, sample_dist_low)
hist(cor_between[lower.tri(cor_between)])

然后我得到了我所期望的相关性。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73899053

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档