首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何使用R检验几个时间序列之间的差异

如何使用R检验几个时间序列之间的差异
EN

Stack Overflow用户
提问于 2012-02-08 01:27:54
回答 2查看 1.8K关注 0票数 3

我有许多时间序列,每个时间序列代表一个植物物种。我认为有一种模式取决于木本植物的密度。高密度木本植物在雨季之间才开花。低密度木本植物在任何时期都能开花。

对于许多树种时间序列和木本植物密度的测量,我如何用R建模来演示这种模式?

以下是数据外观的示例:

代码语言:javascript
运行
复制
#Woody Density
set.seed(69)
wden<-round(c(rnorm(5,mean=50),rnorm(5,mean=90)))
names(wden)<-c(paste("sp",1:10,sep=""))
wden

#Chuva
rain<-c(150,100,50,40,20,20,30,50,70,100,150,200,150,100,50,30,20,20,40,50,70,100,150,200)


#Flowering measures
ydet<-c(10,10,10,10,20,40,50,40,20,10,10,10)

#2 years for 5 low woody density and 5 high density species
flowering<-matrix(NA,nrow=24, ncol=10,dimnames=list(paste("month",1:24,sep=""),paste("sp",1:10,sep="")))
for (i in 1:5) {
               flowering[,i]<-round(c(ydet+rnorm(12,mean=5,sd=5),ydet+rnorm(12,mean=5,sd=5)),digits=2)
               }
for (i in 6:10) {
               flowering[,i]<-round(c(rnorm(12,mean=30,sd=5),rnorm(12,mean=30,sd=5)),digits=2)
               }
#Changing objects to Time series
flowering<-ts(flowering)
#Plot series
plot(flowering)

#Making colors for wood density
cores<-heat.colors(10,alpha=1)
matplot(c(1:24),flowering,type="l",lwd=2,lty=1,xlab="Time",ylab="Flowering",col=cores[order(wden)])

#Plotting Rain Together with time series
bargraph<-barplot(rain/max(rain)*100,xlab="Time",ylab="Rain")
matlines(bargraph,flowering,type="l",lwd=2,lty=1,xlab="Time",ylab="Flowering",col=cores[order(wden)])
axis(1,at=bargraph,labels=1:24)
axis(4,at=seq(0,100,by=10))
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2012-02-08 04:02:00

我可能会建议你在http://stats.stackexchange.com上或者在r-sig-ecology@r-project.org邮件列表上试试。这有点像是一罐虫子。根本的问题是,很难证明两个时间序列之间的关联是因果的,因为(特别是当两个时间序列都有规律地随时间波动时),它们很容易简单地在大约同一时间段波动,因此看起来像是一条直线(典型的例子是太阳黑子周期和各种完全不相关的时间序列,如纽约证券交易所)。

经典的方法是独立地“白化”每个时间序列(你可以拟合周期样条或正弦模型,或者只是从季节平均值中提取模式的差异),直到每个时间序列与白噪声无法区分,然后检查时间序列之间的交叉相关性(在滞后零处,即规则相关,或在其他滞后处表示领先/跟随模式)。在您的情况下,您可能希望看到交叉相关如何反过来随木材密度变化。

或者,您可以将数据归结为“雨季”和“旱季”,然后进行更标准的分析(通过集中处理消除了大部分相关性),但是(1)最好有一个先验的季节划分,而不是通过查看数据来进行划分;(2)这样可能会失去一些能力和/或有趣的细微尺度模式;(3)一些基本的推断问题仍然存在--开花是与降雨有关,还是仅仅与雨季有关

不过,这是一个很好的例子。

票数 2
EN

Stack Overflow用户

发布于 2017-05-31 05:53:49

显然,这是早就应该做的,但在过去的几年里,在处理自相关数据方面取得了一些相当大的进步。我目前也在做同样的事情,只是二项式和错误分类错误(要么做大,要么回家,对吧?)无论如何,我已经使用了来自MRSea包和Pirotta等人的Scott-Hawyard的代码。2011年,以确定自相关时间数据中的重要预测因子。下面是我的方法。我已经编写了一个自定义模型拟合函数,它很慢(有助于避免滥用和误用)来识别有意义的预测器(通过向后QIC选择),以及执行重复的Walds测试来确定显著性的函数。在这种情况下,森林类型确实很重要,而物种并不重要。

对于我使用的阻塞结构物种,它可能比你需要的更保守(森林类型可能就足够了),但我在选择上倾向于保守。如果您使用此代码,请引用Pirotta et al2011和MRSea包(可在CREEM网站上找到),希望这对您有所帮助。

还要注意,anova函数可能不合适。Pirotta使用了geepack中的anova.gee函数,但该函数似乎在新版本中消失了,我还没有找到替代品。欢迎提出建议。

代码语言:javascript
运行
复制
### Functions  

# This code  calculates do backwards stepwise QIC to select the best model,
# then repeated Walds tests to retain meaningful variables and CalcAUC to caluclate
# the area under the ROC curve for each fitted model. All code based on
# Pirotta E, Matthiopoulos J, MacKenzie M, Scott-Hayward L, Rendell L Modelling sperm whale habitat preference: a novel approach combining transect and follow data
library(geepack)
library(splines)
library(AUC)
library(PresenceAbsence)
library(ROCR)            # to build the ROC curve
library(mvtnorm)         # for rmvnorm used in predictions/plotting


SelectModel=function(ModelFull){

  # Calculate the QIC of the full model
  fullmodQ=QIC(ModelFull)
  newQIC=0
  terms=attr(ModelFull$terms,"term.labels")


  while(newQIC != fullmodQ & length(terms)>1){


    # get all the terms for the full model
    terms <- attr(ModelFull$terms,"term.labels")
    n=length(terms)

    newmodel=list()
    newQIC=list()

    newmodel[[1]]=ModelFull
    newQIC[[1]]=fullmodQ

    # Make n models with selection
    for (ii in 1:n){
      dropvar=terms[ii]
      newTerms <- terms[-match(dropvar,terms)]
      newform <- as.formula(paste(".~.-",dropvar))
      newmodel[[ii+1]] <- update(ModelFull,newform)
      newQIC[[ii+1]] =QIC(newmodel[[ii]])

    }

    # Get the model with the lowest QIC
    LowestMod=which.min(unlist(newQIC))

    if (LowestMod != 1){
      ModelFull=newmodel[[LowestMod]]
      newQIC=min(unlist(newQIC))
    } else {
      ModelFull=ModelFull
      newQIC=min(unlist(newQIC))
    }


    #end the model selection


  }
  return(ModelFull)

}

DropVarsWalds=function(ModelFull){

  # If no terms included return 
  if (length(attr(ModelFull$terms,"term.labels"))<2){
    NewModel='No Covariates to select from'

  }else{


    OldModel=ModelFull
    # Get the anova values
    temp=anova(ModelFull)

    # Make n models with selection
    while(length(which(temp$`P(>|Chi|)`>.05))>0 & is.data.frame(temp)){


      # get the maximum value
      dropvar=rownames(temp)[which.max(temp$`P(>|Chi|)`)]

      # new formula for the full model
      newform <- as.formula(paste(".~.-",dropvar))

      # new full model
      ModelFull= update(ModelFull,newform) 

      # Get the model covariate names
      terms <- attr(ModelFull$terms,"term.labels")

      # # Get the anova values
      # temp=anova(ModelFull)

      temp=tryCatch({anova(ModelFull)}, error=function(e){e})


    }

    NewModel=ModelFull
  }

  return(NewModel)
}


# functions to produce partial plots (largely based on MRSea, Scott-Hawyard)
partialDF=function(mod, data, Variable){

  coefpos <- c(1, grep(Variable, colnames(model.matrix(mod))))
  xvals <- data[, which(names(data) == Variable)]
  newX <- seq(min(xvals), max(xvals), length = 500)
  eval(parse(text = paste(Variable, "<- newX", 
                          sep = "")))
  response <- rep(1, 500)
  newBasis <- eval(parse(text = labels(terms(mod))[grep(Variable, 
                                                        labels(terms(mod)))]))
  partialfit <- cbind(rep(1, 500), newBasis) %*% coef(mod)[coefpos]
  rcoefs <- NULL
  try(rcoefs <- rmvnorm(1000, coef(mod), summary(mod)$cov.scaled), 
      silent = T)
  if (is.null(rcoefs) || length(which(is.na(rcoefs) == 
                                      T)) > 0) {
    rcoefs <- rmvnorm(1000, coef(mod), as.matrix(nearPD(summary(mod)$cov.scaled)$mat))
  }
  rpreds <- cbind(rep(1, 500), newBasis) %*% t(rcoefs[, 
                                                      coefpos])
  quant.func <- function(x) {
    quantile(x, probs = c(0.025, 0.975))
  }
  cis <- t(apply(rpreds, 1, quant.func))

  partialfit <- mod$family$linkinv(partialfit)
  cis <- mod$family$linkinv(cis)

  fitdf=data.frame(x=newX, y=partialfit, LCI=cis[,1], UCI=cis[,2]) 
  colnames(fitdf)[1]=Variable
  return(fitdf)
}

partialdf_factor=function(mod, data, variable){
  coeffac <- c(1,grep(variable, colnames(model.matrix(mod))))
  coefradial <- c(grep("LocalRadialFunction", colnames(model.matrix(mod))))
  coefpos <- coeffac[which(is.na(match(coeffac, coefradial)))]
  xvals <- data[, which(names(data) == variable)]
  newX <- sort(unique(xvals))
  newX <- newX[2:length(newX)]
  partialfit <- coef(mod)[c(coefpos)]
  rcoefs <- NULL
  try(rcoefs <- rmvnorm(1000, coef(mod), summary(mod)$cov.scaled), 
      silent = T)
  if (is.null(rcoefs) || length(which(is.na(rcoefs) == T)) > 0) {
    rcoefs <- rmvnorm(1000, coef(mod), as.matrix(nearPD(summary(mod)$cov.scaled)$mat))
  }

  if((length(coefpos))>1){

    rpreds <- as.data.frame(rcoefs[, c(coefpos)])
    fitdf_year=data.frame(vals=rpreds[,1])
    fitdf_year$FactorVariable=as.factor(paste(variable, levels(xvals)[1], sep = ''))

    ############################
    # Recompile for plotting #
    #############################


    for(jj in 2:ncol(rpreds)){
      temp=data.frame(vals=rpreds[,jj])
      temp$FactorVariable=as.factor(colnames(rpreds)[jj])
      fitdf_year=rbind(fitdf_year, temp)
      rm(temp)
    }

  }else{

    rpreds <- rcoefs[,coefpos]
    fitdf_year=data.frame(vals=rpreds)
    fitdf_year$FactorVariable=colnames(model.matrix(mod))[coefpos[2:length(coefpos)]]

  }

  fitdf=fitdf_year
  colnames(fitdf)[2]=variable

  return(fitdf)


}


#Reshape your data

temp=data.frame(flowering)
flowering=data.frame(y=temp[,1], spp=colnames(temp)[1])

for(ii in 2:ncol(temp)){

  flowering=rbind(flowering, data.frame(y=temp[,ii], spp=colnames(temp)[ii]))

}

flowering$rain=rep(rain, ncol(temp))
flowering$woods=as.factor(c(rep('dense',nrow(temp)*5), rep('light',nrow(temp)*5)))

fulmod=geeglm(formula=y~bs(rain)+woods, 
              family = gaussian, 
              id=spp, 
              data=flowering, 
              corstr = 'ar1')

# Use stepwise QIC to select the best model
QICmod=SelectModel(fulmod)

# Use repeated walds to select signficinat variables
sig_mod=DropVarsWalds(ModelFull = QICmod)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/9180876

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档