考虑一下这个简单的例子
mytest <- data_frame(group = c('a', 'a', 'a', 'b', 'b', 'b'),
x = c(NA,NA,NA,5,6,7),
other_var = c(NA, NA, NA, 1,2,3),
y = c(3,5,6,NA,NA,NA),
another_var = c(1,2,3, NA,NA,NA),
label_x = c('hello','hello','hello','world','world','world'),
label_y =c('bada','bada','bada','boom','boom','boom'),
label_other_var = c('ak','ak','ak','run','run','run'),
label_another_var = c('noo','noo','noo','bie','bie','bie'))
# A tibble: 6 x 9
group x other_var y another_var label_x label_y label_other_var label_another_var
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 a NA NA 3 1 hello bada ak noo
2 a NA NA 5 2 hello bada ak noo
3 a NA NA 6 3 hello bada ak noo
4 b 5 1 NA NA world boom run bie
5 b 6 2 NA NA world boom run bie
6 b 7 3 NA NA world boom run bie
在这里,我需要通过nest()
通过group
提取这个数据,并能够提取非NAs的变量的列名(在每个嵌套的dataframe中)。诀窍是变量的实际名称显示在label_
列中。
例如,这是所需的输出:
# A tibble: 4 x 2
group var
<chr> <chr>
1 a bada
2 a noo
3 b world
4 b run
实际上,以a
组为例。只有一个不缺少的变量是y
和another_var
。但是,y
的名称是bada
(如label_y
变量中所示),而another_var
的名称是noo
。同样的道理也适用于b
。
我不知道如何在运行后使用map
调用来完成这个任务。
mytest %>% group_by(group) %>% nest()
# A tibble: 2 x 2
group data
<chr> <list>
1 a <tibble [3 x 8]>
2 b <tibble [3 x 8]>
有什么想法吗?谢谢!
编辑:最初的,较小的,提球建议如下
mytest <- data_frame(group = c('a', 'a', 'a', 'b', 'b', 'b'),
+ x = c(NA,NA,NA,5,6,7),
+ y = c(3,5,6,NA,NA,NA),
+ label_x = c('hello','hello','hello','world','world','world'),
+ label_y =c('bada','bada','bada','boom','boom','boom'))
发布于 2018-09-12 07:59:20
通过nest
ing分组后,使用map
循环遍历'data‘,通过提取first
非NA元素将'label’列summarise
,在移除NA
(na.rm = TRUE)
)时将其gather
到单个列,然后执行unnest
(仅保留感兴趣的列)。
mytest %>%
group_by(group) %>%
nest %>%
mutate(var = map(data, ~
.x %>%
summarise(label_x = label_x[!is.na(x)][1],
label_y = label_y[!is.na(y)][1]) %>%
gather(key, var, na.rm = TRUE) %>%
select(var))) %>%
select(-data) %>%
unnest
# A tibble: 2 x 2#
# group var
# <chr> <chr>
#1 a bada
#2 b world
更新
如果有更多的列,则创建唯一的列名,然后用map2
循环相应的列名。
nm1 <- unique(sub("label_", "", setdiff(names(mytest), "group")))
nm2 <- paste0("label_", nm1)
mytest %>%
group_by(group) %>%
nest %>%
mutate(var = map(data, ~
map2_chr(.x %>%
select(nm1),
.x %>%
select(nm2), ~
.y[!is.na(.x)][1]) %>%
na.omit %>%
tibble(var = .))) %>%
select(-data) %>%
unnest
# A tibble: 4 x 2
# group var
# <chr> <chr>
#1 a bada
#2 a noo
#3 b world
#4 b run
发布于 2018-09-12 09:10:45
这将输出所需的结果:
mytest <- data_frame(group = c('a', 'a', 'a', 'b', 'b', 'b'),
x = c(NA,NA,NA,5,6,7),
y = c(3,5,6,NA,NA,NA),
label_x = c('hello','hello','hello','world','world','world'),
label_y =c('bada','bada','bada','boom','boom','boom'))
extract_good_colnames <- function(df, subgroup){
subset <- filter(df, group == subgroup)
if(sum(is.na(subset$x)) > 0){
colname = 'label_y'
}else if(sum(is.na(subset$y)) > 0){
colname = 'label_x'
}
return(tibble(group = subgroup, var = as.character(subset[1, colname])))
}
groups <- unique(mytest$group)
map_df(groups, function(x) extract_good_colnames(mytest, x))
https://stackoverflow.com/questions/52299170
复制相似问题