我有一个数据,其中一些列定义了组,一些列(下面示例数据中的A1-A4)只在一列中有一个值,其余的列中有NA。
structure(list(gp = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "gp1", class = "factor"), id = c(1, 1, 2, 2, 2, 2, 3, 3, 3), name = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), a1 = c(0.4, NA, NA, NA, NA, NA, 0.3, NA, NA), a2 = c(NA, NA, NA, 1, NA, NA, NA, NA, NA), a3 = c(NA, 1.2, NA, NA, NA, NA, NA, NA, NA), a4 = c(NA, NA, 1, NA, NA, NA, NA, NA, 1)), .Names = c("gp", "id", "name", "a1", "a2", "a3", "a4"), row.names = c(NA, -9L), class = "data.frame")
因为我知道a1列中只有一列实际上有值,并且我不需要单独的行,所以我希望将group中的所有值收集到一行中。我希望是下面这样的东西。
structure(list(gp = structure(c(1L, 1L, 1L), .Label = "gp1", class = "factor"), id = c(1, 2, 3), name = structure(1:3, .Label = c("A", "B", "C"), class = "factor"), a1 = c(0.4, NA, 0.3), a2 = c(NA, 1, NA), a3 = c(1.2, NA, NA), a4 = c(NA, 1, 1)), .Names = c("gp", "id", "name", "a1", "a2", "a3", "a4"), row.names = c(NA, -3L), class = "data.frame")
我怎样才能做到这一点呢?如果解决方案使用tidyverse,那就太好了。
发布于 2018-06-09 07:40:47
dat%>%
group_by(gp,id,name)%>%
summarise_all(funs(lift(coalesce)(.)))
# A tibble: 3 x 7
# Groups: gp, id [?]
gp id name a1 a2 a3 a4
<fct> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 gp1 1. A 0.400 NA 1.20 NA
2 gp1 2. B NA 1. NA 1.
3 gp1 3. C 0.300 NA NA 1.
发布于 2018-06-09 05:05:36
由于OP提到只能有一行有值,因此一种选择是在应用group_by
之后使用dplyr::first
。我更喜欢使用summarise_at
来灵活地排除一些不需要分析的列。
library(dplyr)
df %>% group_by(gp, id, name) %>%
summarise_at(vars(starts_with("a")), funs(dplyr::first(sort(.)))) %>%
as.data.frame()
# gp id name a1 a2 a3 a4
# 1 gp1 1 A 0.4 NA 1.2 NA
# 2 gp1 2 B NA 1 NA 1
# 3 gp1 3 C 0.3 NA NA 1
https://stackoverflow.com/questions/50767583
复制相似问题