我有一个数据集,其中ID列在第1列,诊断列在第2列,原因列在第3列,依此类推。对于具有多个诊断的ID,这些诊断被列为每个诊断的一行,因此具有的相同ID有多行。我希望每个ID都是一行,列是几个诊断。这个是可能的吗?
我的数据看起来像这样:
ID diagnosis cause_of_diagnosis
1 A A
1 B B
1 C C
2 A A
3 A A
3 B B
3 C C我希望数据最终看起来像这样:
ID diagnosis_1 diagnosis_2 diagnosis_3 cause_of_diagnosis_1 cause_of_diagnosis_2 cause_of_diagnosis_3
1 A B C A B C
2 A - - A - -
3 A B C A B C发布于 2019-09-11 08:34:37
我们可以使用dplyr和tidyr对数据进行整形。使用gather获取长格式的数据,使用group_by ID和key为每个"diagnosis“和"cause_of_diagnosis”创建一个新的列名,并以宽格式spread数据。
library(dplyr)
library(tidyr)
df %>%
gather(key, value, -ID) %>%
group_by(ID, key) %>%
mutate(key1 = paste(key, row_number(), sep = "_")) %>%
ungroup() %>%
select(-key) %>%
spread(key1, value)
# ID cause_of_diagnosis_1 cause_of_diagnosis_2 cause_of_diagnosis_3 diagnosis_1 diagnosis_2 diagnosis_3
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 A B C A B C
#2 2 A NA NA A NA NA
#3 3 A B C A B C data
df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 3L, 3L),diagnosis = structure(c(1L,
2L, 3L, 1L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"),
cause_of_diagnosis = structure(c(1L, 2L, 3L, 1L, 1L, 2L,
3L), .Label = c("A", "B", "C"), class = "factor")),
class = "data.frame", row.names = c(NA, -7L))https://stackoverflow.com/questions/57879603
复制相似问题