我有一个多个疾病的数据集,0表示没有这种疾病,1有这种疾病。
举个例子来说明:我对疾病A感兴趣,以及数据集中的人是自己有这种疾病,还是作为另一种疾病的原因。因此,我希望创建一个新的变量"Type“,其值为"NotDiseasedWithA”、“主”和“备用”。可导致A的疾病包含在载体“SecondaryCauses”中:
SecondaryCauses = c("DiseaseB", "DiseaseD")
"NotDiseasedWithA“指的是他们没有疾病A。“初级”指的是他们患有A型疾病,但没有任何已知的可引起这种疾病的疾病。“二级”指的是他们有A病和可能引起这种疾病的疾病。
样本数据
ID DiseaseA DiseaseB DiseaseC DiseaseD DiseaseE
1 0 1 0 0 0
2 1 0 0 0 1
3 1 0 1 1 0
4 1 0 1 1 1
5 0 0 0 0 0
我的问题是:
我尝试了以下几种方法,但这不起作用:
DF %>% mutate(Type = ifelse(DiseaseA == 0, "NotDiseasedWithA", ifelse(sum(names(DF) %in% SecondaryCauses) > 0, "Secondary", "Primary")))
最后,我想得到这样的结果:
ID DiseaseA DiseaseB DiseaseC DiseaseD DiseaseE Type
1 0 1 0 0 0 NotDiseasedWithA
2 1 0 0 0 1 Primary
3 1 0 1 1 0 Secondary
4 1 0 1 1 1 Secondary
5 0 0 0 0 0 NotDiseasedWithA
发布于 2022-02-15 10:54:04
使用 data.table
df <- structure(list(ID = 1:5, DiseaseA = c(0L, 1L, 1L, 1L, 0L), DiseaseB = c(1L,
0L, 0L, 0L, 0L), DiseaseC = c(0L, 0L, 1L, 1L, 0L), DiseaseD = c(0L,
0L, 1L, 1L, 0L), DiseaseE = c(0L, 1L, 0L, 1L, 0L)), row.names = c(NA,
-5L), class = c("data.frame"))
library(data.table)
setDT(df) # make it a data.table
SecondaryCauses = c("DiseaseB", "DiseaseD")
df[DiseaseA == 0, Type := "NotDiseasedWithA"][DiseaseA == 1, Type := ifelse(rowSums(.SD) > 0, "Secondary", "Primary"), .SDcols = SecondaryCauses]
df
# ID DiseaseA DiseaseB DiseaseC DiseaseD DiseaseE Type
# 1: 1 0 1 0 0 0 NotDiseasedWithA
# 2: 2 1 0 0 0 1 Primary
# 3: 3 1 0 1 1 0 Secondary
# 4: 4 1 0 1 1 1 Secondary
# 5: 5 0 0 0 0 0 NotDiseasedWithA
https://stackoverflow.com/questions/71124276
复制相似问题