首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >模型拟合与疏浚并行化(glmmTMB +疏浚)

模型拟合与疏浚并行化(glmmTMB +疏浚)
EN

Stack Overflow用户
提问于 2022-10-27 13:31:17
回答 1查看 36关注 0票数 1

我的目标是在应用于dredge()模型时尽可能加快glmmTMB()函数的速度。我知道这两个函数都可以并行化:glmmTMB()control参数,dredge()cluster参数。

我的问题是:为了获得最大的速度,我能同时并行化模型拟合和疏浚吗?换句话说,我可以将/stack/相加并行化glmmTMB()dredge()的速度优势结合起来吗?

我试图通过在R会话中创建两个单独的集群来做到这一点,并且通过将各种选项与microbenchmark()进行比较,我似乎已经实现了我的目标。

然而,由于我刚刚从其他地方复制了代码,我不知道自己在做什么!我有统计学和R编程的背景,但是并行化是我刚刚开始学习的东西。这里有几个问题。

这个过程还能更快吗?在R会话中创建两个集群是一个好主意吗?在现实中,速度效益可以相加在一起,还是我只是看到了一件艺术品?有人能推荐一些学习资源来更好地理解这些功能吗?

非常感谢!

代码语言:javascript
运行
复制
## Load libraries

library(glmmTMB)
library(microbenchmark)
library(multcomp)
library(MuMIn)
library(parallel)

## Create large dataset (idea from the glmmTMB vignette on parallel optimization)
N <- 3e5
x1 <- rnorm(N, 1, 2)
x2 <- rnorm(N, 4, 2)
x3 <- rnorm(N, 10, 2)
y <- 0.3 + 0.4 * x1 - 0.2 * x2 + 0.9 * x3 + rnorm(N, 0, 0.25)

df <- data.frame(y,
                 x1,
                 x2,
                 x3)


## Create two clusters

# create cluster "cl", but export nothing
cl  <-  parallel::makeCluster((parallel::detectCores() - 1))

# create cluster" "clust" and export data and libraries (following documentation of pdredge)
clust  <-  parallel::makeCluster((parallel::detectCores() - 1))
parallel::clusterEvalQ(clust, library(glmmTMB))
parallel::clusterEvalQ(clust, library(MuMIn))
parallel::clusterExport(clust, "df")


## Compare running times for glmmTMB(): both "cl" and "clust" reduce running times
microbenchmark::microbenchmark(
  
  # No parallel
  glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
  
  # Parallel model with "cl"
  glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
  
  # Parallel model with "clust"
  glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust))),
  
  times = 10
  
)
Unit: seconds
expr
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df)
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl)))
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust)))
      min       lq     mean   median       uq      max neval cld
 4.526190 4.556430 4.625324 4.631528 4.670585 4.745891    10   b
 2.271729 2.282912 2.315834 2.293132 2.343508 2.393902    10  a 
 2.231709 2.288383 2.382596 2.400160 2.459594 2.507514    10  a 


## Compare running times when parallelization is attempted
## both for glmmTMB() and dredge()

options(na.action = "na.fail")

microbenchmark::microbenchmark(
  
  # No parallel
  MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
                rank = "AICc"),
  
  # Parallel glmmTMB with "cl"
  MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
                rank = "AICc"),
  
  # Parallel dredge with "clust"
  MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
              rank = "AICc", cluster = clust),
  
  # Both: parallel glmmTMB with "cl", parallel dredge with "clust"
  MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
              rank = "AICc", cluster = clust),

times = 10

)




Unit: seconds
                                                                                                                                                                   expr
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
rank = "AICc")
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
rank = "AICc")
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
rank = "AICc", cluster = clust)
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
rank = "AICc", cluster = clust)

      min       lq     mean   median       uq      max neval  cld
 24.95914 25.17014 25.41935 25.27549 25.53169 26.47337    10    d
 14.21192 14.56461 15.28324 14.93494 15.88009 16.69395    10   c 
 13.48460 13.66408 14.09466 13.99638 14.30151 15.40998    10  b  
 11.07945 11.36578 11.75006 11.60089 12.31227 12.55529    10 a   


## Thse other options don't work

# Parallel dredge with "cl": Not using cluster, regardless of how I parallelize the model
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
              rank = "AICc",
              cluster = cl)
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust))),
              rank = "AICc",
              cluster = cl)

# Parallel dredge and model with "clust": Doesn't work
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust))),
              rank = "AICc",
              cluster = clust)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-10-27 20:12:38

您没有在包dredgeglmmTMB中添加函数的并行化,速度增益来自导出包和数据。

当您并行化时,除了一个核之外,所有的内核都会很忙,所以当再次并行化时,没有什么可获得的,也就没有内核了。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74222940

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档