文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在R中创建在某个关键字之前和之后的单词列？

问如何在R中创建在某个关键字之前和之后的单词列？
EN

Stack Overflow用户

提问于 2021-01-09 05:53:02

回答 1查看 50关注 0票数 0

我的输入是一个csv文件，它有下面的表，其中有句子和类别列。

Sentence                            Class
Joe just joined Alice on the set.   B
Alexis buys green apples            C
Yesterday, two friends unite.       A
Combination between x and y!        A

每个类都有一个排名的单词列表。(不是在csv中)

Class A keyword list    Class B keyword list    Class C keyword list
unite                   joined                  buy
combination             join                    buys
together                merge                   bought

我的输出需要是一个csv，在该句子中的class关键字列表中，在排名最高的关键字之前和之后添加的单词列。(见下图)

请注意，某些列中有空格，因为对应的单词在该句子中不存在。

我如何在R中做到这一点？

regex

text

tidyverse

multiple-columns

回答 1

Stack Overflow用户

发布于 2021-01-09 16:15:26

假设您导入了文件并将其转换为以下格式：

df <- tribble(
  ~ Sentence, ~ Class,
  "Joe just joined Alice on the set.", "B",
  "Alexis buys green apples", "C",
  "Yesterday, two friends unite.", "A",
  "Combination between x and y!", "A"
)

kw_list <- list(
  A=c("unite", "combination", "together"),
  B=c("joined", "join", "merge"),
  C=c("buy", "buys", "bought")
)

然后您可以获取镜像中指定的数据帧，如下所示：

result <- df %>% mutate(res=map2(Sentence, Class, function(sentence, class){
  word_list <- sentence %>% str_replace_all("[(,)(\\.)(!)]", "") %>%
    str_split(" ") %>% .[[1]] %>% str_to_lower()
  kws <- word_list %>% c(kw_list[[class]]) %>% .[duplicated(.)]
  if(length(kws)==0){
    return(NA)
  }else{
    kws %>% map(function(kw){
      position <- str_which(word_list, str_c("^", kw, "$"))
      left_kw <- if(position!=1){
        word_list[1:(position-1)] %>% rev() %>% .[1:3] %>%
          tibble(name=c("1st", "2nd", "3rd") %>% str_c(" word from left"), value=.) %>%
          arrange(desc(name)) %>% pivot_wider()
      }else{
        NULL
      }
      right_kw <- if(position!=length(word_list)){
        word_list[(position+1):length(word_list)] %>% .[1:3] %>%
          tibble(name=c("1st", "2nd", "3rd") %>% str_c(" word from right"), value=.) %>%
          pivot_wider()
      }else{
        NULL
      }
      bind_cols(left_kw, tibble(`key word`=kw), right_kw)
    }) %>% reduce(bind_rows) %>% return()
  }
})) %>% unnest(cols=res)

这可以处理包括几个关键字的句子和不包括任何关键字的句子。

请注意，所有字母都将更改为小写，如果包含,.!以外的其他符号，将无法正常工作。

当然，这有点太长了，也不是最好的解决方案，但希望它能有所帮助。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65636910

复制

相似问题

问如何在R中创建在某个关键字之前和之后的单词列？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在R中创建在某个关键字之前和之后的单词列？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在R中创建在某个关键字之前和之后的单词列？
EN