问按ID列折叠所有列
EN

Stack Overflow用户

提问于 2014-11-18 04:29:03

回答 5查看 9.3K关注 0票数 15

我正在尝试做一些类似于what's answered here的事情，它能让我完成80%的任务。我有一个包含一个ID列和多个信息列的数据框。我想汇总其他列的所有，以便每个ID只有一行，并且多个条目由分号等分隔。这是我所拥有和我想要的东西的一个例子。

具备以下条件：

     ID  info1          info2
1 id101    one          first
2 id102   twoA second alias A
3 id102   twoB second alias B
4 id103 threeA  third alias A
5 id103 threeB  third alias B
6 id104   four         fourth
7 id105   five          fifth

想要：

     ID          info1                          info2
1 id101            one                          first
2 id102     twoA; twoB second alias A; second alias B
3 id103 threeA; threeB   third alias A; third alias B
4 id104           four                         fourth
5 id105           five                          fifth

下面是用于生成这些代码的代码：

have <- data.frame(ID=paste0("id", c(101, 102, 102, 103, 103, 104, 105)),
                   info1=c("one", "twoA", "twoB", "threeA", "threeB", "four", "five"), 
                   info2=c("first", "second alias A", "second alias B", "third alias A", "third alias B", "fourth", "fifth"),
                   stringsAsFactors=FALSE)
want <- data_frame(ID=paste0("id", c(101:105)),
                   info1=c("one", "twoA; twoB", "threeA; threeB", "four", "five"), 
                   info2=c("first", "second alias A; second alias B", "third alias A; third alias B", "fourth", "fifth"),
                   stringsAsFactors=FALSE)

This question问了基本上相同的问题，但只有一个“信息”栏。我有多个其他专栏，并希望为所有这些列做这件事。

使用dplyr做这件事的加分。

dplyr

回答 5

Stack Overflow用户

回答已采纳

发布于 2014-11-18 04:35:27

下面是一个使用summarise_each (这使得将更改应用于除分组变量之外的所有列)和toString的选项

require(dplyr)

have %>%
  group_by(ID) %>%
  summarise_each(funs(toString))

#Source: local data frame [5 x 3]
#
#     ID          info1                          info2
#1 id101            one                          first
#2 id102     twoA, twoB second alias A, second alias B
#3 id103 threeA, threeB   third alias A, third alias B
#4 id104           four                         fourth
#5 id105           five                          fifth

或者，如果希望用分号分隔，可以使用：

have %>%
  group_by(ID) %>%
  summarise_each(funs(paste(., collapse = "; ")))

票数 17

Stack Overflow用户

发布于 2014-11-18 04:43:16

优秀的老aggregate在这方面做得很好。

aggregate(have[,2:3], by=list(have$ID), paste, collapse=";")

问题是:它是否具有可扩展性？

票数 12

Stack Overflow用户

发布于 2014-11-18 04:31:34

这是一个data.table解决方案。

library(data.table)
setDT(have)[, lapply(.SD, paste, collapse = "; "), by = ID]
#       ID          info1                          info2
# 1: id101            one                          first
# 2: id102     twoA; twoB second alias A; second alias B
# 3: id103 threeA; threeB   third alias A; third alias B
# 4: id104           four                         fourth
# 5: id105           five                          fifth

票数 9

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/26981385

复制

相似问题

问按ID列折叠所有列
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问按ID列折叠所有列EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问按ID列折叠所有列
EN