首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >重新匹配以从gregexpr返回多个匹配。

重新匹配以从gregexpr返回多个匹配。
EN

Stack Overflow用户
提问于 2016-05-12 23:59:05
回答 3查看 880关注 0票数 2

我想在单词“帮助”之前和之后抓住2-3个单词。

我有如下一段文字:

....features和许多绿色植物帮助舒缓nerves...blah blah...cozy在他们的毛绒毯子,以帮助放松神经

这就是我所做的

代码语言:javascript
运行
复制
x <- paste("(\\S+\\s+|^)(\\S+\\s+|)(\\S+\\s+|)", treatSym[i], ".?(\\s+\\S+|)(\\s+\\S+|$)(\\s+\\S+|$)", sep="")

matching <- gregexpr(x,text)

regmatches(text, matching, invert = FALSE)

我得到了这个错误,因为我猜测长度(匹配)= 2。但是,当只有一个匹配时,它工作得很好。

代码语言:javascript
运行
复制
Error in regmatches(text, matching, invert = FALSE) : 
  ‘x’ and ‘m’ must have the same length

有没有一个更好的解决方案,叫出2-3个字之前和之后的关键字?

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2016-05-13 01:29:52

n是长度为2的向量,表示关键字前后的单词数。

代码语言:javascript
运行
复制
n <- c(2, 2)
x <- "....features and lots of greenery to help soothe the nerves...blah blah...cozy up in their plush blankets to help relax the nerves"

pat <- sprintf('(?:[a-z]+ ){%s}help(?: [a-z]+){%s}', n[1], n[2])
m <- gregexpr(pat, x, perl = TRUE)
regmatches(x, m)[[1]]
# [1] "greenery to help soothe the" "blankets to help relax the" 

作为一种功能

代码语言:javascript
运行
复制
f <- function(string, keyword, n = c(2,2)) {
  # pat <- sprintf('(?:[a-z]+ ){%s}%s(?: [a-z]+){%s}', n[1], keyword, n[2])
  pat <- sprintf('(?:[a-z]+ ){0,%s}%s(?: [a-z]+){0,%s}', n[1], keyword, n[2])
  m <- gregexpr(pat, string, perl = TRUE)
  regmatches(string, m)[[1]]
}

f(x, 'help', c(1, 2))
# [1] "to help soothe the" "to help relax the" 
票数 4
EN

Stack Overflow用户

发布于 2016-05-13 01:10:09

另一种选择是拆分单词,获取help的索引,并在每个help之前/之后获取2或3个单词。

代码语言:javascript
运行
复制
library(magrittr)
library(stringi)
library(SOfun)  ### https://github.com/mrdwab/SOfun

x <- "....features and lots of greenery to help soothe the nerves...blah blah...cozy up in their plush blankets to help relax the nerves"

选项1:只需得到以下单词

代码语言:javascript
运行
复制
### Remove ... and split words
temp <- stri_replace_all_regex(pattern = "[[:punct:]]", replacement = " ", str = x) %>%
        stri_split_fixed(pattern = " ") %>%
        unlist %>%
        .[nchar(.) > 0]

data.frame(word = temp, stringsAsFactors = FALSE) %>%
getMyRows(pattern = grep(pattern = "help", x = .$word), range = -3:3) %>%
lapply(function(ana){ana[-grep(pattern = "help", x = ana)]})

#[[1]]
#[1] "of"       "greenery" "to"       "soothe"   "the"      "nerves"  
#
#[[2]]
#[1] "plush"    "blankets" "to"       "relax"    "the"      "nerves" 

如果要查看为每个help选择了哪些单词,可以尝试以下方法。

选项2:创建数据框架

代码语言:javascript
运行
复制
temp <- stri_replace_all_regex(pattern = "[[:punct:]]", replacement = " ", str = x) %>%
        stri_split_fixed(pattern = " ") %>%
        unlist %>%
        .[nchar(.) > 0]

data.frame(word = temp, stringsAsFactors = FALSE) %>%
getMyRows(pattern = grep(pattern = "help", x = .$word), range = -3:3) %>%
lapply(function(ana){ana[-grep(pattern = "help", x = ana)]}) -> temp


do.call(rbind,
        lapply(temp, function(y){
                        data.frame(word = y,
                                   ind = c(-3:-1, 1:3),
                                   stringsAsFactors = FALSE)}
              )
        )

# ind indicates relative positions of the words. words with negative
# numbers are on left side of help. Words with positive numbers on right.

#       word ind
#1        of  -3
#2  greenery  -2
#3        to  -1
#4    soothe   1
#5       the   2
#6    nerves   3
#7     plush  -3
#8  blankets  -2
#9        to  -1
#10    relax   1
#11      the   2
#12   nerves   3
票数 2
EN

Stack Overflow用户

发布于 2016-05-13 00:39:12

您可以使用quanteda包进行类似的操作。

代码语言:javascript
运行
复制
my.string <- "....features and lots of greenery to help soothe the nerves...blah blah...cozy up in their plush blankets to help relax the nerves"

library(quanteda)
kwic(my.string, "help", window = 3, valuetype = "fixed")

                     contextPre keyword         contextPost
[text1, 11]    of greenery to [    help ] soothe the nerves
[text1, 30] plush blankets to [    help ] relax the nerves 
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/37199262

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档