亲爱的StackOverflow社区,作为一名外科医生,在自学成才的模式下(StackOverflow和许多网站)对R学习充满了6个月的热情,我恳请您原谅我所关心的琐事。
背景:简单地说,我的目标是对癌症患者的数据进行生存cox模型回归。由于回顾性方面,我计划做1:3与倾向评分匹配(PSM)。缺失的数据被处理多个估算(“老鼠”pkg)。PSM用"MatchThem“pkg管理。我使用“调查”pkg来汇集生存(svycoxph()通过()函数集合)。这就引出了一个mimira对象,我可以很容易地将它打印到一个漂亮的表中,并使用tbl_regression (“gt汇总”pkg)。
问题:作为一个通常打印我的考克斯回归到一个危险比率表和一个图形版本(森林绘图与ggforest(),从“冲浪者”pkg),这一次我真的被困住了。函数ggforest不将mimira对象识别为"coxph对象“并发送此错误:
Error in ggforest(tbl_regression_object, data = mimira_object) :
inherits(model, "coxph") is not TRUE
我想向我的多个计算中添加一个PSM是个问题,因为我在打印多个计算的cox回归时没有问题(ggforest能够处理mira对象,而不存在pool_and_tidy_mice()函数的问题)。
下面是脚本:
#Data
library(fabricatr)
library(simsurv)
# Simulate patient data in a clinical trial
participant_data <- fabricate(
N = 2000,
age = runif(N, min = 18, max = 85),
is_female = draw_binary(prob = 0.5, N = N),
is_smoker = draw_binary(prob = 0.2 + 0.2 * (age > 50), N = N),
disease_stage = round(runif(N, min = 1 + 0.5 * (age > 65), max = 4)),
treatment = draw_binary(prob = 0.5, N = N),
kps = runif(N, min = 40, max = 100)
)
# Simulate data in the survival context
survival_data <- simsurv(
lambdas = 0.1, gammas = 1.8,
x = participant_data,
betas = c(is_female = -0.2, is_smoker = 1.2,
treatment = -0.4, kps = -0.005,
disease_stage = 0.2),
maxt = 5)
# Merging df
library(dplyr)
mydata_complete <- bind_cols(survival_data, participant_data)
# generating missing value
library(missMethods)
mydata_uncomp <- delete_MCAR(mydata_complete, 0.3)
mydata <- mydata_uncomp
#1 imputation with "mice"
library(mice)
mydata$nelsonaalen <- nelsonaalen(mydata, eventtime, status)
mydata_mice_imp_m3 <- mice(mydata, maxit = 2, m = 3, seed = 20200801) # m=3 is for testing
#2 matching (PSM 1:3) with "MatchThem"
library(MatchThem)
mydata_imp_m3_psm <- matchthem(treatment ~ age + is_female + disease_stage, data = mydata_mice_imp_m3, approach = "within" ,ratio= 1, method = "optimal")
#3 Pooling Coxph models in multiple imputed datasets and PSM with "survey"
library(survey)
mimira_object <- with(data = mydata_imp_m3_psm, expr = svycoxph(Surv(eventtime, status) ~ age+ is_smoker + disease_stage))
pool_and_tidy_mice(mimira_object, exponentiate = TRUE, conf.int=TRUE) -> pooled_imp_m3_cph
# estimates with pool_and_tidy_mice() works with mimira_object but cannot bring me de degree of freedoms. Warning message :
In get.dfcom(object, dfcom) : Infinite sample size assumed.
> pooled_imp_m3_cph
term estimate std.error statistic p.value conf.low conf.high b df dfcom fmi lambda m riv ubar
1 age 0.9995807 0.001961343 -0.2138208 NaN NaN NaN 1.489769e-06 NaN Inf NaN 0.5163574 3 1.067643 1.860509e-06
2 is_smoker 2.8626952 0.093476026 11.2516931 NaN NaN NaN 4.182884e-03 NaN Inf NaN 0.6382842 3 1.764601 3.160589e-03
3 disease_stage 1.2386947 0.044092483 4.8547535 NaN NaN NaN 8.995628e-04 NaN Inf NaN 0.6169374 3 1.610540 7.447299e-04
#4 Table summary of the pooled results
library(gtsummary)
tbl_regression_object <- tbl_regression(mimira_object, exp=TRUE, conf.int = TRUE) # 95% CI and p-value are missing due to an issue with an other issue in the pooling of the mimira_object. The Matchthem:::get.2dfcom function gives a dfcom = 999999 (another issue to be solved in my concern)
#5 What it should looks like as graphical summary
library(survival)
mydata.cox <- coxph(Surv(eventtime, status) ~ age+ is_smoker + disease_stage, mydata_uncomp) # (df mydata_uncomp is without imputation and PSM)
#with gtsummary
forestGT <-
mydata.cox %>%
tbl_regression(exponentiate = TRUE,
add_estimate_to_reference_rows = TRUE) %>%
plot()
(forestGT) # See picture GT_plot1. Almost perfect. Would have been great to know how to add N, 95% CI, HR, p-value and parameters of the model (AIC, events, concordance, etc.)
#with survminer
HRforest <-
survminer::ggforest(mydata.cox, data = mydata_uncomp)
(HRforest) # See picture Ggforest. Everything I need to know about my cox regression is all in there. For me it is just a great regression cox forest plot.
#6 Actually what happens when I do the same thing with imputed and matched df
#with gtsummary
forestGT_imp_psm <-
mimira_object %>%
tbl_regression(exponentiate = TRUE,
add_estimate_to_reference_rows = TRUE) %>%
plot() # WARNING message : In get.dfcom(object, dfcom) : Infinite sample size assumed.
(forestGT_imp_psm) # See picture GT_plot2. The plot is rendered but without 95% IC
#with survminer
HRforest_imp_psm <-
ggforest(mimira_object, data = mydata_imp_m3_psm) # ERROR:in ggforest(mimira_object, data = mydata_imp_m3_psm) : inherits(model, "coxph") is not TRUE
(HRforest_imp_psm)
#7 The lucky and providential step
# your solution/advise
会非常感谢你的帮助。
干杯。
AK
图片GT_plot1 (不允许在本文中嵌入图像,这里是sharelink:plot1
图片Ggforest_plot 绘图
图片GT_plot2 plot2
发布于 2021-09-01 17:42:27
这里似乎有两个截然不同的问题:
问题#1.让gtsummary()
生成具有p值和池匹配数据的置信区间的表
问题#2.生成一个ggforest()
来生成集合估计的一个图。
问题#1:
让我们按照"MatchThem::多重估算后的匹配和加权“(https://arxiv.org/ftp/arxiv/papers/2009/2009.11772.pdf)第15页中的说明进行操作。
并修改您的块#3。我们没有调用pool_and_tidy_mice()
,而是执行以下操作:
matched.results <- pool(mimira_object)
summary(matched.results, conf.int = TRUE)
这产生了以下情况:
term estimate std.error statistic df p.value 2.5 % 97.5 %
1 age -0.0005997864 0.001448251 -0.4141453 55.266353 6.803707e-01 -0.003501832 0.00230226
2 is_smoker 1.1157796620 0.077943244 14.3152839 9.961064 5.713387e-08 0.942019234 1.28954009
3 disease_stage 0.2360965310 0.051799813 4.5578645 3.879879 1.111782e-02 0.090504018 0.38168904
这意味着使用mice
执行估算,然后与MatchThem
进行匹配是有效的,因为您确实获得了p值和置信区间。
与pool_and_tidy_mice()
输出的比较
term estimate std.error statistic p.value b df dfcom fmi lambda m
1 age -0.0005997864 0.001448251 -0.4141453 NaN 2.992395e-07 NaN Inf NaN 0.1902260 3
2 is_smoker 1.1157796620 0.077943244 14.3152839 NaN 2.041627e-03 NaN Inf NaN 0.4480827 3
3 disease_stage 0.2360965310 0.051799813 4.5578645 NaN 1.444843e-03 NaN Inf NaN 0.7179644 3
riv ubar
1 0.2349124 1.698446e-06
2 0.8118657 3.352980e-03
3 2.5456522 7.567636e-04
其中,除了df和p.value之外,所有内容都是相同的,后者在后一个表中没有计算。
因此,我认为这是pool_and_tidy_mice()
中的一个问题,您应该将其作为GitHub的一个问题在gt汇总中发布。
现在,您可以通过在调用svycoxph()
函数时将第3块中的survival::coxph()
更改为with()
来绕过这个问题。如果您这样做,那么最终您将得到一个带有p.values和置信区间的get汇总表。最终,问题可能是svycoxph()
和pool_and_mice()
之间的一些交互,因此我认为您应该在GitHub上发布这篇文章。
问题2:
简单的回答是,不可能有一个包含您正在寻找的所有数据的ggforest绘图。
https://www.rdocumentation.org/packages/mice/versions/3.13.0/topics/pool读到:
一个常见的错误是反转步骤2和步骤3,即将多个计算的数据集合在一起,而不是__的估计值。这样做可能会严重偏离科学兴趣的估计,产生不正确的统计区间和p值。缓冲池()函数将检测这种情况。
这意味着没有用于池估计的“真实”数据集(也就是说,您不能真正组合用于计算1-3的数据集),这意味着ggforest()
无法计算所需的地块(因为它需要一个数据集,并且不能使用它,因为它会导致错误的估计)。
您可以做的是,为每个估算提供所有的ggforest样地(因此,如果您进行了3个估算,您将得到3个稍微不同的ggg林样地),最后按照上面的建议使用plot()
添加集合估计图。
要创建每个ggforest图,您需要以下代码行:
ggforest(mimira_object$analyses[[1]], complete(mydata_imp_m3_psm, 1))
这将为您的第一次估算创建全球森林地块。将数字更改为2和3,以检查其余的估算。
我希望这能帮上忙
亚历克斯
https://stackoverflow.com/questions/68936768
复制相似问题