我使用的是finalfit,它很适合输出模型、拟合等,但是在下面更简单的情况下,我只有一个连续的结果作为我的因变量。1)我想知道,如果有一个因子变量,那么指定p = TRUE是否可以获得跨标签值的p值? 2)如何指定不同类型的测试。vignette 这里说它使用“Kruskal/Mann表示连续”,如果我们想要一个不同的测试,我们将如何指定这个? 3)类似地,当您有一个连续变量时,是否有一种方法在级别列之后添加计数和解释变量的百分比?
当因变量为因素时生成p值列的示例代码:
library(finalfit)
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
t <- colon_s %>%
summary_factorlist(dependent, explanatory, p = TRUE)
t
# label levels No Yes p
# 1 Age (years) Mean (SD) 59.8 (11.9) 58.4 (13.3) 0.578
# 2 Age <40 years 68 (97.1) 2 (2.9) 1.000
# 3 40-59 years 334 (97.1) 10 (2.9)
# 4 60+ years 500 (97.1) 15 (2.9)
# 7 Sex Female 432 (97.1) 13 (2.9) 0.979
# 8 Male 470 (97.1) 14 (2.9)
# 5 Obstruction No 715 (97.7) 17 (2.3) 0.018
# 6 Yes 166 (94.3) 10 (5.7) 现在,在连续变量的情况下,我希望有一个p值列:
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "time"
## Crosstable
table_1 <- colon_s %>%
summary_factorlist(dependent, explanatory, p = TRUE)
table_1
# label levels Mean (sd)
# 1 Age <40 years 1544.3 (867.6)
# 2 40-59 years 1769.4 (861.1)
# 3 60+ years 1620.6 (875.3)
# 4 Sex Female 1674.2 (884.6)
# 5 Male 1666.0 (861.4)
# 6 Obstruction No 1700.0 (852.4)
# 7 Yes 1515.6 (933.6)
# 8 Perforation No 1671.2 (873.1)
# 9 Yes 1627.6 (851.8)在这种情况下,p = TRUE不起作用,我认为p值列会运行一个测试,例如,根据建议的测试类型,比较Sex之间time的均值/中值。也可以在“级别”列之后添加“计数”和“比例”列,例如女性445 (47 %)。
谢谢
发布于 2020-01-30 21:20:10
Finalfit 1.0.0已经进入预览阶段,可以从下面的Github安装。如果你能看一看,并确保它正在做它应该做的事情,我将不胜感激。有相当多的重写/改进,所以可能会有错误。
安装
remotes::install_github('ewenharrison/finalfit')具有连续依赖的假设检验
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "time"
colon_s %>%
summary_factorlist(dependent, explanatory, p = TRUE)
label levels unit value p
Age (years) [18.0,85.0] Mean (sd) 1670.0 (872.1) 0.543
Age <40 years Mean (sd) 1544.3 (867.6) 0.023
40-59 years Mean (sd) 1769.4 (861.1)
60+ years Mean (sd) 1620.6 (875.3)
Sex Female Mean (sd) 1674.2 (884.6) 0.886
Male Mean (sd) 1666.0 (861.4)
Obstruction No Mean (sd) 1700.0 (852.4) 0.012
Yes Mean (sd) 1515.6 (933.6)
Perforation No Mean (sd) 1671.2 (873.1) 0.798
Yes Mean (sd) 1627.6 (851.8) 连续与连续是Pearson系数p-值(可能是有限的).在默认情况下,连续与范畴是方差分析,p_cont_para = "aov"。在只有两级解释因素的情况下,可以将其更改为p_cont_para = "t.test"进行韦尔奇t测试。
非参数选项
将所有摘要设置为中位数(iqr),并使用Spearman的cont/cont和Kruskal Wallis表示cont/cat。
colon_s %>%
summary_factorlist(dependent, explanatory, p = TRUE,
cont = "median")
label levels unit value p
Age (years) [18.0,85.0] Median (IQR) 2.0 (4.0) 0.001
Age <40 years Median (IQR) 3.0 (5.0) 0.027
40-59 years Median (IQR) 3.0 (4.0)
60+ years Median (IQR) 2.0 (3.0)
Sex Female Median (IQR) 3.0 (4.0) 0.777
Male Median (IQR) 2.0 (3.0)
Obstruction No Median (IQR) 2.0 (4.0) 0.774
Yes Median (IQR) 3.0 (3.0)
Perforation No Median (IQR) 2.0 (4.0) 0.125
Yes Median (IQR) 3.0 (2.0) 四分位数范围,而不是Q3-Q1。
colon_s %>%
summary_factorlist(dependent, explanatory, p = TRUE,
cont = "median", cont_range = TRUE)
label levels unit value p
Age (years) [18.0,85.0] Median (IQR) 2.0 (1.0 to 5.0) 0.001
Age <40 years Median (IQR) 3.0 (2.0 to 7.0) 0.027
40-59 years Median (IQR) 3.0 (1.0 to 5.0)
60+ years Median (IQR) 2.0 (1.0 to 4.0)
Sex Female Median (IQR) 3.0 (1.0 to 5.0) 0.777
Male Median (IQR) 2.0 (1.0 to 4.0)
Obstruction No Median (IQR) 2.0 (1.0 to 5.0) 0.774
Yes Median (IQR) 3.0 (1.0 to 4.0)
Perforation No Median (IQR) 2.0 (1.0 to 5.0) 0.125
Yes Median (IQR) 3.0 (2.0 to 4.0) 指定要制作非参数变量。
colon_s %>%
summary_factorlist(dependent, explanatory, p = TRUE,
cont_nonpara = c(1, 3, 5))
label levels unit value p
Age (years) [18.0,85.0] Median (IQR) 2.0 (4.0) 0.001
Age <40 years Mean (sd) 4.7 (4.5) 0.034
40-59 years Mean (sd) 3.6 (3.3)
60+ years Mean (sd) 3.6 (3.6)
Sex Female Median (IQR) 3.0 (4.0) 0.777
Male Median (IQR) 2.0 (3.0)
Obstruction No Mean (sd) 3.7 (3.7) 0.435
Yes Mean (sd) 3.5 (3.2)
Perforation No Median (IQR) 2.0 (4.0) 0.125
Yes Median (IQR) 3.0 (2.0) 添加计数和丢失数据
这令人惊讶地难以理解,尤其是在因变量和解释变量中缺少数据时。以下是当前的想法。
explanatory = c("age", "nodes", "sex.factor", "obstruct.factor")
dependent = "time"
# Change 29 rows in variable time to missing
colon_s[1:29, "time"] = NA
colon_s %>%
summary_factorlist(dependent, explanatory, p = TRUE,
total_col = TRUE, add_row_totals = TRUE)
label Total N Missing N levels unit value Total p
Age (years) 900 0 [18.0,85.0] Mean (sd) 1663.6 (860.2) 900 (100.0) 0.766
nodes 882 18 [0.0,33.0] Mean (sd) 1663.6 (860.2) 882 (100.0) <0.001
Sex 900 0 Female Mean (sd) 1674.6 (873.8) 432 (48.0) 0.712
Male Mean (sd) 1653.4 (848.3) 468 (52.0)
Obstruction 881 19 No Mean (sd) 1687.7 (840.8) 713 (80.9) 0.039
Yes Mean (sd) 1535.8 (924.0) 168 (19.1) Total N列(add_row_totals = TRUE)是该解释变量的值总数。在这里,数据帧是929长的,但是在依赖项中有29个丢失。Missing N仅是该变量中的缺失,不包括相依缺失。Total列(total_col = TRUE)是具有历史意义的,为兼容性起见,保留了该名称和参数。但这显示了各因素之间的比例分歧。
显示丢失的数据
下面将解释因素中的缺失数据显式化,但不会通过假设检验。
colon_s %>%
summary_factorlist(dependent, explanatory, p = TRUE,
total_col = TRUE, add_row_totals = TRUE,
na_include = TRUE)
label Total N Missing N levels unit value Total p
Age (years) 900 0 [18.0,85.0] Mean (sd) 1663.6 (860.2) 900 (100.0) 0.766
nodes 882 18 [0.0,33.0] Mean (sd) 1663.6 (860.2) 882 (100.0) <0.001
Sex 900 0 Female Mean (sd) 1674.6 (873.8) 432 (48.0) 0.712
Male Mean (sd) 1653.4 (848.3) 468 (52.0)
Obstruction 881 19 No Mean (sd) 1687.7 (840.8) 713 (79.2) 0.039
Yes Mean (sd) 1535.8 (924.0) 168 (18.7)
(Missing) Mean (sd) 1890.4 (916.5) 19 (2.1) 将缺失的数据级别传递给假设检验
colon_s %>%
summary_factorlist(dependent, explanatory, p = TRUE,
total_col = TRUE, add_row_totals = TRUE,
na_include = TRUE, na_to_p = TRUE)
label Total N Missing N levels unit value Total p
Age (years) 900 0 [18.0,85.0] Mean (sd) 1663.6 (860.2) 900 (100.0) 0.766
nodes 882 18 [0.0,33.0] Mean (sd) 1663.6 (860.2) 882 (100.0) <0.001
Sex 900 0 Female Mean (sd) 1674.6 (873.8) 432 (48.0) 0.712
Male Mean (sd) 1653.4 (848.3) 468 (52.0)
Obstruction 881 19 No Mean (sd) 1687.7 (840.8) 713 (79.2) 0.061
Yes Mean (sd) 1535.8 (924.0) 168 (18.7)
(Missing) Mean (sd) 1890.4 (916.5) 19 (2.1) 完全案例分析
这在发展成线性模型时是非常有用的。在逐行删除之后,模型中实际上包含了哪些数据。
colon_s %>%
summary_factorlist(dependent, explanatory, p = TRUE,
total_col = TRUE, add_row_totals = TRUE,
na_complete_case = TRUE)
label Total N Missing N levels unit value Total p
Age (years) 863 37 [18.0,85.0] Mean (sd) 1664.2 (856.1) 863 (100.0) 0.766
nodes 863 37 [0.0,33.0] Mean (sd) 1664.2 (856.1) 863 (100.0) <0.001
Sex 863 37 Female Mean (sd) 1672.6 (869.5) 414 (48.0) 0.712
Male Mean (sd) 1656.5 (844.4) 449 (52.0)
Obstruction 863 37 No Mean (sd) 1691.3 (837.9) 699 (81.0) 0.039
Yes Mean (sd) 1548.7 (923.3) 164 (19.0) 详细信息:https://finalfit.org/任何想法都很感激。
发布于 2019-11-29 21:15:52
所有这些功能都将在Finalef1.0.0中提供,并将在短期内发布。也许在圣诞节之前。谢谢你的兴趣。
https://stackoverflow.com/questions/59108562
复制相似问题