文章/答案/技术大牛

发布

社区首页 >问答首页 >模型拟合与疏浚并行化(glmmTMB +疏浚)

问模型拟合与疏浚并行化(glmmTMB +疏浚)
EN

Stack Overflow用户

提问于 2022-10-27 13:31:17

回答 1查看 36关注 0票数 1

我的目标是在应用于dredge()模型时尽可能加快glmmTMB()函数的速度。我知道这两个函数都可以并行化：glmmTMB()与control参数，dredge()与cluster参数。

我的问题是:为了获得最大的速度，我能同时并行化模型拟合和疏浚吗？换句话说，我可以将/stack/相加并行化glmmTMB()和dredge()的速度优势结合起来吗？

我试图通过在R会话中创建两个单独的集群来做到这一点，并且通过将各种选项与microbenchmark()进行比较，我似乎已经实现了我的目标。

然而，由于我刚刚从其他地方复制了代码，我不知道自己在做什么！我有统计学和R编程的背景，但是并行化是我刚刚开始学习的东西。这里有几个问题。

这个过程还能更快吗？在R会话中创建两个集群是一个好主意吗？在现实中，速度效益可以相加在一起，还是我只是看到了一件艺术品？有人能推荐一些学习资源来更好地理解这些功能吗？

非常感谢！

## Load libraries

library(glmmTMB)
library(microbenchmark)
library(multcomp)
library(MuMIn)
library(parallel)

## Create large dataset (idea from the glmmTMB vignette on parallel optimization)
N <- 3e5
x1 <- rnorm(N, 1, 2)
x2 <- rnorm(N, 4, 2)
x3 <- rnorm(N, 10, 2)
y <- 0.3 + 0.4 * x1 - 0.2 * x2 + 0.9 * x3 + rnorm(N, 0, 0.25)

df <- data.frame(y,
                 x1,
                 x2,
                 x3)


## Create two clusters

# create cluster "cl", but export nothing
cl  <-  parallel::makeCluster((parallel::detectCores() - 1))

# create cluster" "clust" and export data and libraries (following documentation of pdredge)
clust  <-  parallel::makeCluster((parallel::detectCores() - 1))
parallel::clusterEvalQ(clust, library(glmmTMB))
parallel::clusterEvalQ(clust, library(MuMIn))
parallel::clusterExport(clust, "df")


## Compare running times for glmmTMB(): both "cl" and "clust" reduce running times
microbenchmark::microbenchmark(
  
  # No parallel
  glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
  
  # Parallel model with "cl"
  glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
  
  # Parallel model with "clust"
  glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust))),
  
  times = 10
  
)
Unit: seconds
expr
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df)
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl)))
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust)))
      min       lq     mean   median       uq      max neval cld
 4.526190 4.556430 4.625324 4.631528 4.670585 4.745891    10   b
 2.271729 2.282912 2.315834 2.293132 2.343508 2.393902    10  a 
 2.231709 2.288383 2.382596 2.400160 2.459594 2.507514    10  a 


## Compare running times when parallelization is attempted
## both for glmmTMB() and dredge()

options(na.action = "na.fail")

microbenchmark::microbenchmark(
  
  # No parallel
  MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
                rank = "AICc"),
  
  # Parallel glmmTMB with "cl"
  MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
                rank = "AICc"),
  
  # Parallel dredge with "clust"
  MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
              rank = "AICc", cluster = clust),
  
  # Both: parallel glmmTMB with "cl", parallel dredge with "clust"
  MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
              rank = "AICc", cluster = clust),

times = 10

)




Unit: seconds
                                                                                                                                                                   expr
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
rank = "AICc")
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
rank = "AICc")
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
rank = "AICc", cluster = clust)
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
rank = "AICc", cluster = clust)

      min       lq     mean   median       uq      max neval  cld
 24.95914 25.17014 25.41935 25.27549 25.53169 26.47337    10    d
 14.21192 14.56461 15.28324 14.93494 15.88009 16.69395    10   c 
 13.48460 13.66408 14.09466 13.99638 14.30151 15.40998    10  b  
 11.07945 11.36578 11.75006 11.60089 12.31227 12.55529    10 a   


## Thse other options don't work

# Parallel dredge with "cl": Not using cluster, regardless of how I parallelize the model
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
              rank = "AICc",
              cluster = cl)
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust))),
              rank = "AICc",
              cluster = cl)

# Parallel dredge and model with "clust": Doesn't work
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust))),
              rank = "AICc",
              cluster = clust)

performance

optimization

parallel-processing

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-10-27 20:12:38

您没有在包dredge和glmmTMB中添加函数的并行化，速度增益来自导出包和数据。

当您并行化时，除了一个核之外，所有的内核都会很忙，所以当再次并行化时，没有什么可获得的，也就没有内核了。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74222940

复制

相似问题

问模型拟合与疏浚并行化(glmmTMB +疏浚)
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问模型拟合与疏浚并行化(glmmTMB +疏浚)EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问模型拟合与疏浚并行化(glmmTMB +疏浚)
EN