问合并不同的单独文本行，并按特定模式对它们进行排序
EN

Stack Overflow用户

提问于 2019-04-08 08:47:38

回答 1查看 41关注 0票数 -1

在所有的混乱中，这是一个问题：

data = readLines("file.txt")

# data reads
[1] "JESSICA [Day 1, 9:00 A.M.]: When there is sun, there was darkness."
[2] " However, nobody knew it was happening."
[3] " SAM [Day 1, 9:01 A.M.]: I thought it was not true."
[4] " But it was."
[5] " I thought it was "present" but it wasn't."

我尝试做的是：(1)按名称合并文本(JESSICA，SAM)。

我可以识别数据中的名字

test = regexpr("^([A-Z]+ \\[)",data)
names = regmatches(data,test)
final.name = sub("\\[","",names)

[1] "JESSICA" "SAM"

我可以确定数据中的日期和时间

test = regexpr("\\[(.*)\\]", data)
time = regmatches(data,test)

[1] "[Day 1, 9:00 A.M.]" "[Day 1, 9:01 A.M.]"

我遇到的困难是为每个名字合并不同的行。也就是说，不是这样：

[1] "JESSICA [Day 1, 9:00 A.M.]: When there is sun, there was darkness."
[2] " However, nobody knew it was happening."

我希望每一行都是：

[1] "JESSICA [Day 1, 9:00 A.M.]: When there is sun, there was darkness. However, nobody knew it was happening."
[2] " SAM [Day 1, 9:01 A.M.]: I thought it was not true. But it was. I thought it was "present" but it wasn't."

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-04-08 09:49:19

逻辑类似于现在删除@Maurits的答案。我们可以根据final.name的出现情况创建组，并通过将文本粘贴到一个组中来summarise文本。我认为data是单列数据帧，因为它比普通字符串更容易处理数据帧。

library(dplyr)

data %>%
  group_by(group = cumsum(grepl(paste0(final.name, collapse = "|"), statement))) %>%
  summarise(statement = paste0(statement, collapse = " ")) %>%
  ungroup() %>%
  select(-group)


#statement                                                                                                 
#    <chr>                                                                                                     
#1 JESSICA [Day 1, 9:00 A.M.]: When there is sun, there was darkness.  However, nobody knew it was happening.
#2 SAM [Day 1, 9:01 A.M.]: I thought it was not true.  But it was.  I thought it was present but it wasn't.

使用base R方法，我们可以使用aggregate

aggregate(statement~cumsum(grepl(paste0(final.name, collapse = "|"), statement)), 
                    data, paste0, collapse = " ")[2]

data

data <- data.frame(statement = c(
       "JESSICA [Day 1, 9:00 A.M.]: When there is sun, there was darkness.",
       " However, nobody knew it was happening.",
       "SAM [Day 1, 9:01 A.M.]: I thought it was not true.",
       " But it was.",
       " I thought it was present but it wasn't."))

final.name <- c("JESSICA", "SAM")

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/55564963

复制

相似问题

问合并不同的单独文本行，并按特定模式对它们进行排序
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问合并不同的单独文本行，并按特定模式对它们进行排序EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问合并不同的单独文本行，并按特定模式对它们进行排序
EN