我想将作者的每四篇文章组合在一个广泛的数据框架中,如果剩下的文章少于四篇,则合并这些帖子(例如,一个作者有11篇文章,我最终得到4篇文章中的2篇和3篇文章中的1篇)。
下面是我的数据框的一个示例:
name text
bee _ so we know that right
bee said so
alma hello,
alma Good to hear back from you.
bee I've currently written an application
alma I'm happy about it
bee It was not the last.
alma Will this ever stop.
alma Yet another line.
alma so
我想把它改成:
name text
bee _ so we know that right said so I've currently written an application It was not the last.
alma hello, Good to hear back from you. I'm happy about it Will this ever stop
alma Yet another line. so
以下是初始数据帧:
df = structure(list(name = c("bee", "bee", "alma", "alma", "bee", "alma", "bee", "alma", "alma", "alma"), text = c( "_ so we know that right", "said so", "hello,", "Good to hear back from you.", "I've currently written an application", "I'm happy about it", "It was not the last.", "Will this ever stop.", "Yet another line.", "so")), .Names = c("name", "text"), row.names = c(NA, -10L), class = "data.frame")
发布于 2019-12-14 19:17:38
利用dplyr
的一种选择可能是:
df %>%
group_by(name) %>%
mutate(ID = ceiling(row_number()/4)) %>%
group_by(name, ID) %>%
summarise_all(paste, collapse = " ")
name ID text
<chr> <dbl> <chr>
1 alma 1 hello, Good to hear back from you. I'm happy about it Will this ever stop.
2 alma 2 Yet another line. so
3 bee 1 _ so we know that right said so I've currently written an application It was…
https://stackoverflow.com/questions/59334585
复制相似问题