我的目标是在应用于dredge()
模型时尽可能加快glmmTMB()
函数的速度。我知道这两个函数都可以并行化:glmmTMB()
与control
参数,dredge()
与cluster
参数。
我的问题是:为了获得最大的速度,我能同时并行化模型拟合和疏浚吗?换句话说,我可以将/stack/相加并行化glmmTMB()
和dredge()
的速度优势结合起来吗?
我试图通过在R会话中创建两个单独的集群来做到这一点,并且通过将各种选项与microbenchmark()
进行比较,我似乎已经实现了我的目标。
然而,由于我刚刚从其他地方复制了代码,我不知道自己在做什么!我有统计学和R编程的背景,但是并行化是我刚刚开始学习的东西。这里有几个问题。
这个过程还能更快吗?在R会话中创建两个集群是一个好主意吗?在现实中,速度效益可以相加在一起,还是我只是看到了一件艺术品?有人能推荐一些学习资源来更好地理解这些功能吗?
非常感谢!
## Load libraries
library(glmmTMB)
library(microbenchmark)
library(multcomp)
library(MuMIn)
library(parallel)
## Create large dataset (idea from the glmmTMB vignette on parallel optimization)
N <- 3e5
x1 <- rnorm(N, 1, 2)
x2 <- rnorm(N, 4, 2)
x3 <- rnorm(N, 10, 2)
y <- 0.3 + 0.4 * x1 - 0.2 * x2 + 0.9 * x3 + rnorm(N, 0, 0.25)
df <- data.frame(y,
x1,
x2,
x3)
## Create two clusters
# create cluster "cl", but export nothing
cl <- parallel::makeCluster((parallel::detectCores() - 1))
# create cluster" "clust" and export data and libraries (following documentation of pdredge)
clust <- parallel::makeCluster((parallel::detectCores() - 1))
parallel::clusterEvalQ(clust, library(glmmTMB))
parallel::clusterEvalQ(clust, library(MuMIn))
parallel::clusterExport(clust, "df")
## Compare running times for glmmTMB(): both "cl" and "clust" reduce running times
microbenchmark::microbenchmark(
# No parallel
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
# Parallel model with "cl"
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
# Parallel model with "clust"
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust))),
times = 10
)
Unit: seconds
expr
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df)
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl)))
glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust)))
min lq mean median uq max neval cld
4.526190 4.556430 4.625324 4.631528 4.670585 4.745891 10 b
2.271729 2.282912 2.315834 2.293132 2.343508 2.393902 10 a
2.231709 2.288383 2.382596 2.400160 2.459594 2.507514 10 a
## Compare running times when parallelization is attempted
## both for glmmTMB() and dredge()
options(na.action = "na.fail")
microbenchmark::microbenchmark(
# No parallel
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
rank = "AICc"),
# Parallel glmmTMB with "cl"
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
rank = "AICc"),
# Parallel dredge with "clust"
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
rank = "AICc", cluster = clust),
# Both: parallel glmmTMB with "cl", parallel dredge with "clust"
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
rank = "AICc", cluster = clust),
times = 10
)
Unit: seconds
expr
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
rank = "AICc")
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
rank = "AICc")
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df),
rank = "AICc", cluster = clust)
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
rank = "AICc", cluster = clust)
min lq mean median uq max neval cld
24.95914 25.17014 25.41935 25.27549 25.53169 26.47337 10 d
14.21192 14.56461 15.28324 14.93494 15.88009 16.69395 10 c
13.48460 13.66408 14.09466 13.99638 14.30151 15.40998 10 b
11.07945 11.36578 11.75006 11.60089 12.31227 12.55529 10 a
## Thse other options don't work
# Parallel dredge with "cl": Not using cluster, regardless of how I parallelize the model
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(cl))),
rank = "AICc",
cluster = cl)
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust))),
rank = "AICc",
cluster = cl)
# Parallel dredge and model with "clust": Doesn't work
MuMIn::dredge(global.model = glmmTMB::glmmTMB(y ~ x1 + x2 + x3, data = df, control = glmmTMBControl(parallel = length(clust))),
rank = "AICc",
cluster = clust)
发布于 2022-10-27 20:12:38
您没有在包dredge
和glmmTMB
中添加函数的并行化,速度增益来自导出包和数据。
当您并行化时,除了一个核之外,所有的内核都会很忙,所以当再次并行化时,没有什么可获得的,也就没有内核了。
https://stackoverflow.com/questions/74222940
复制相似问题