文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用R检验几个时间序列之间的差异

问如何使用R检验几个时间序列之间的差异
EN

Stack Overflow用户

提问于 2012-02-08 01:27:54

回答 2查看 1.8K关注 0票数 3

我有许多时间序列，每个时间序列代表一个植物物种。我认为有一种模式取决于木本植物的密度。高密度木本植物在雨季之间才开花。低密度木本植物在任何时期都能开花。

对于许多树种时间序列和木本植物密度的测量，我如何用R建模来演示这种模式？

以下是数据外观的示例：

#Woody Density
set.seed(69)
wden<-round(c(rnorm(5,mean=50),rnorm(5,mean=90)))
names(wden)<-c(paste("sp",1:10,sep=""))
wden

#Chuva
rain<-c(150,100,50,40,20,20,30,50,70,100,150,200,150,100,50,30,20,20,40,50,70,100,150,200)


#Flowering measures
ydet<-c(10,10,10,10,20,40,50,40,20,10,10,10)

#2 years for 5 low woody density and 5 high density species
flowering<-matrix(NA,nrow=24, ncol=10,dimnames=list(paste("month",1:24,sep=""),paste("sp",1:10,sep="")))
for (i in 1:5) {
               flowering[,i]<-round(c(ydet+rnorm(12,mean=5,sd=5),ydet+rnorm(12,mean=5,sd=5)),digits=2)
               }
for (i in 6:10) {
               flowering[,i]<-round(c(rnorm(12,mean=30,sd=5),rnorm(12,mean=30,sd=5)),digits=2)
               }
#Changing objects to Time series
flowering<-ts(flowering)
#Plot series
plot(flowering)

#Making colors for wood density
cores<-heat.colors(10,alpha=1)
matplot(c(1:24),flowering,type="l",lwd=2,lty=1,xlab="Time",ylab="Flowering",col=cores[order(wden)])

#Plotting Rain Together with time series
bargraph<-barplot(rain/max(rain)*100,xlab="Time",ylab="Rain")
matlines(bargraph,flowering,type="l",lwd=2,lty=1,xlab="Time",ylab="Flowering",col=cores[order(wden)])
axis(1,at=bargraph,labels=1:24)
axis(4,at=seq(0,100,by=10))

time-series

statistics

回答 2

Stack Overflow用户

回答已采纳

发布于 2012-02-08 04:02:00

我可能会建议你在http://stats.stackexchange.com上或者在r-sig-ecology@r-project.org邮件列表上试试。这有点像是一罐虫子。根本的问题是，很难证明两个时间序列之间的关联是因果的，因为(特别是当两个时间序列都有规律地随时间波动时)，它们很容易简单地在大约同一时间段波动，因此看起来像是一条直线(典型的例子是太阳黑子周期和各种完全不相关的时间序列，如纽约证券交易所)。

经典的方法是独立地“白化”每个时间序列(你可以拟合周期样条或正弦模型，或者只是从季节平均值中提取模式的差异)，直到每个时间序列与白噪声无法区分，然后检查时间序列之间的交叉相关性(在滞后零处，即规则相关，或在其他滞后处表示领先/跟随模式)。在您的情况下，您可能希望看到交叉相关如何反过来随木材密度变化。

或者，您可以将数据归结为“雨季”和“旱季”，然后进行更标准的分析(通过集中处理消除了大部分相关性)，但是(1)最好有一个先验的季节划分，而不是通过查看数据来进行划分；(2)这样可能会失去一些能力和/或有趣的细微尺度模式；(3)一些基本的推断问题仍然存在--开花是与降雨有关，还是仅仅与雨季有关

不过，这是一个很好的例子。

票数 2

Stack Overflow用户

发布于 2017-05-31 05:53:49

显然，这是早就应该做的，但在过去的几年里，在处理自相关数据方面取得了一些相当大的进步。我目前也在做同样的事情，只是二项式和错误分类错误(要么做大，要么回家，对吧？)无论如何，我已经使用了来自MRSea包和Pirotta等人的Scott-Hawyard的代码。2011年，以确定自相关时间数据中的重要预测因子。下面是我的方法。我已经编写了一个自定义模型拟合函数，它很慢(有助于避免滥用和误用)来识别有意义的预测器(通过向后QIC选择)，以及执行重复的Walds测试来确定显著性的函数。在这种情况下，森林类型确实很重要，而物种并不重要。

对于我使用的阻塞结构物种，它可能比你需要的更保守(森林类型可能就足够了)，但我在选择上倾向于保守。如果您使用此代码，请引用Pirotta et al2011和MRSea包(可在CREEM网站上找到)，希望这对您有所帮助。

还要注意，anova函数可能不合适。Pirotta使用了geepack中的anova.gee函数，但该函数似乎在新版本中消失了，我还没有找到替代品。欢迎提出建议。

### Functions  

# This code  calculates do backwards stepwise QIC to select the best model,
# then repeated Walds tests to retain meaningful variables and CalcAUC to caluclate
# the area under the ROC curve for each fitted model. All code based on
# Pirotta E, Matthiopoulos J, MacKenzie M, Scott-Hayward L, Rendell L Modelling sperm whale habitat preference: a novel approach combining transect and follow data
library(geepack)
library(splines)
library(AUC)
library(PresenceAbsence)
library(ROCR)            # to build the ROC curve
library(mvtnorm)         # for rmvnorm used in predictions/plotting


SelectModel=function(ModelFull){

  # Calculate the QIC of the full model
  fullmodQ=QIC(ModelFull)
  newQIC=0
  terms=attr(ModelFull$terms,"term.labels")


  while(newQIC != fullmodQ & length(terms)>1){


    # get all the terms for the full model
    terms <- attr(ModelFull$terms,"term.labels")
    n=length(terms)

    newmodel=list()
    newQIC=list()

    newmodel[[1]]=ModelFull
    newQIC[[1]]=fullmodQ

    # Make n models with selection
    for (ii in 1:n){
      dropvar=terms[ii]
      newTerms <- terms[-match(dropvar,terms)]
      newform <- as.formula(paste(".~.-",dropvar))
      newmodel[[ii+1]] <- update(ModelFull,newform)
      newQIC[[ii+1]] =QIC(newmodel[[ii]])

    }

    # Get the model with the lowest QIC
    LowestMod=which.min(unlist(newQIC))

    if (LowestMod != 1){
      ModelFull=newmodel[[LowestMod]]
      newQIC=min(unlist(newQIC))
    } else {
      ModelFull=ModelFull
      newQIC=min(unlist(newQIC))
    }


    #end the model selection


  }
  return(ModelFull)

}

DropVarsWalds=function(ModelFull){

  # If no terms included return 
  if (length(attr(ModelFull$terms,"term.labels"))<2){
    NewModel='No Covariates to select from'

  }else{


    OldModel=ModelFull
    # Get the anova values
    temp=anova(ModelFull)

    # Make n models with selection
    while(length(which(temp$`P(>|Chi|)`>.05))>0 & is.data.frame(temp)){


      # get the maximum value
      dropvar=rownames(temp)[which.max(temp$`P(>|Chi|)`)]

      # new formula for the full model
      newform <- as.formula(paste(".~.-",dropvar))

      # new full model
      ModelFull= update(ModelFull,newform) 

      # Get the model covariate names
      terms <- attr(ModelFull$terms,"term.labels")

      # # Get the anova values
      # temp=anova(ModelFull)

      temp=tryCatch({anova(ModelFull)}, error=function(e){e})


    }

    NewModel=ModelFull
  }

  return(NewModel)
}


# functions to produce partial plots (largely based on MRSea, Scott-Hawyard)
partialDF=function(mod, data, Variable){

  coefpos <- c(1, grep(Variable, colnames(model.matrix(mod))))
  xvals <- data[, which(names(data) == Variable)]
  newX <- seq(min(xvals), max(xvals), length = 500)
  eval(parse(text = paste(Variable, "<- newX", 
                          sep = "")))
  response <- rep(1, 500)
  newBasis <- eval(parse(text = labels(terms(mod))[grep(Variable, 
                                                        labels(terms(mod)))]))
  partialfit <- cbind(rep(1, 500), newBasis) %*% coef(mod)[coefpos]
  rcoefs <- NULL
  try(rcoefs <- rmvnorm(1000, coef(mod), summary(mod)$cov.scaled), 
      silent = T)
  if (is.null(rcoefs) || length(which(is.na(rcoefs) == 
                                      T)) > 0) {
    rcoefs <- rmvnorm(1000, coef(mod), as.matrix(nearPD(summary(mod)$cov.scaled)$mat))
  }
  rpreds <- cbind(rep(1, 500), newBasis) %*% t(rcoefs[, 
                                                      coefpos])
  quant.func <- function(x) {
    quantile(x, probs = c(0.025, 0.975))
  }
  cis <- t(apply(rpreds, 1, quant.func))

  partialfit <- mod$family$linkinv(partialfit)
  cis <- mod$family$linkinv(cis)

  fitdf=data.frame(x=newX, y=partialfit, LCI=cis[,1], UCI=cis[,2]) 
  colnames(fitdf)[1]=Variable
  return(fitdf)
}

partialdf_factor=function(mod, data, variable){
  coeffac <- c(1,grep(variable, colnames(model.matrix(mod))))
  coefradial <- c(grep("LocalRadialFunction", colnames(model.matrix(mod))))
  coefpos <- coeffac[which(is.na(match(coeffac, coefradial)))]
  xvals <- data[, which(names(data) == variable)]
  newX <- sort(unique(xvals))
  newX <- newX[2:length(newX)]
  partialfit <- coef(mod)[c(coefpos)]
  rcoefs <- NULL
  try(rcoefs <- rmvnorm(1000, coef(mod), summary(mod)$cov.scaled), 
      silent = T)
  if (is.null(rcoefs) || length(which(is.na(rcoefs) == T)) > 0) {
    rcoefs <- rmvnorm(1000, coef(mod), as.matrix(nearPD(summary(mod)$cov.scaled)$mat))
  }

  if((length(coefpos))>1){

    rpreds <- as.data.frame(rcoefs[, c(coefpos)])
    fitdf_year=data.frame(vals=rpreds[,1])
    fitdf_year$FactorVariable=as.factor(paste(variable, levels(xvals)[1], sep = ''))

    ############################
    # Recompile for plotting #
    #############################


    for(jj in 2:ncol(rpreds)){
      temp=data.frame(vals=rpreds[,jj])
      temp$FactorVariable=as.factor(colnames(rpreds)[jj])
      fitdf_year=rbind(fitdf_year, temp)
      rm(temp)
    }

  }else{

    rpreds <- rcoefs[,coefpos]
    fitdf_year=data.frame(vals=rpreds)
    fitdf_year$FactorVariable=colnames(model.matrix(mod))[coefpos[2:length(coefpos)]]

  }

  fitdf=fitdf_year
  colnames(fitdf)[2]=variable

  return(fitdf)


}


#Reshape your data

temp=data.frame(flowering)
flowering=data.frame(y=temp[,1], spp=colnames(temp)[1])

for(ii in 2:ncol(temp)){

  flowering=rbind(flowering, data.frame(y=temp[,ii], spp=colnames(temp)[ii]))

}

flowering$rain=rep(rain, ncol(temp))
flowering$woods=as.factor(c(rep('dense',nrow(temp)*5), rep('light',nrow(temp)*5)))

fulmod=geeglm(formula=y~bs(rain)+woods, 
              family = gaussian, 
              id=spp, 
              data=flowering, 
              corstr = 'ar1')

# Use stepwise QIC to select the best model
QICmod=SelectModel(fulmod)

# Use repeated walds to select signficinat variables
sig_mod=DropVarsWalds(ModelFull = QICmod)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/9180876

复制

相似问题

问如何使用R检验几个时间序列之间的差异
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用R检验几个时间序列之间的差异EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用R检验几个时间序列之间的差异
EN