问如何获取字符串中可能重叠的匹配项
EN

Stack Overflow用户

提问于 2016-01-23 22:07:17

回答 7查看 2.2K关注 0票数 22

我正在寻找一种方法，无论是在Ruby中还是在Javascript中，都可以给我一个字符串内的所有匹配，可能是重叠的，而不是regexp。

假设我有str = "abcadc"，并且我想查找出现的a，后面跟任意数量的字符，然后是c。我要查找的结果是["abc", "adc", "abcadc"]。你有什么办法让我做到这一点吗？

str.scan(/a.*c/)会给我["abcadc"]，str.scan(/(?=(a.*c))/).flatten会给我["abcadc", "adc"]

ruby

regex

javascript

回答 7

Stack Overflow用户

回答已采纳

发布于 2016-01-23 22:19:46

def matching_substrings(string, regex)
  string.size.times.each_with_object([]) do |start_index, maching_substrings|
    start_index.upto(string.size.pred) do |end_index|
      substring = string[start_index..end_index]
      maching_substrings.push(substring) if substring =~ /^#{regex}$/
    end
  end
end

matching_substrings('abcadc', /a.*c/) # => ["abc", "abcadc", "adc"]
matching_substrings('foobarfoo', /(\w+).*\1/) 
  # => ["foobarf",
  #     "foobarfo",
  #     "foobarfoo",
  #     "oo",
  #     "oobarfo",
  #     "oobarfoo",
  #     "obarfo",
  #     "obarfoo",
  #     "oo"]
matching_substrings('why is this downvoted?', /why.*/)
  # => ["why",
  #     "why ",
  #     "why i",
  #     "why is",
  #     "why is ",
  #     "why is t",
  #     "why is th",
  #     "why is thi",
  #     "why is this",
  #     "why is this ",
  #     "why is this d",
  #     "why is this do",
  #     "why is this dow",
  #     "why is this down",
  #     "why is this downv",
  #     "why is this downvo",
  #     "why is this downvot",
  #     "why is this downvote",
  #     "why is this downvoted",
  #     "why is this downvoted?"]

票数 11

Stack Overflow用户

发布于 2016-01-23 22:35:17

在Ruby中，您可以使用以下命令来实现预期的结果：

str = "abcadc"
[/(a[^c]*c)/, /(a.*c)/].flat_map{ |pattern| str.scan(pattern) }.reduce(:+)
# => ["abc", "adc", "abcadc"]

这种方式是否对你有效，很大程度上取决于你真正想要实现的目标。

我试着把它放在一个单独的表达式中，但我不能让它工作。我真的很想知道这是不是有什么科学原因不能被正则表达式解析，或者我只是对Ruby的解析器Oniguruma了解不够多，无法做到这一点。

票数 11

Stack Overflow用户

发布于 2016-01-23 23:09:31

您需要所有可能的匹配，包括重叠的匹配。正如您已经注意到的，"How to find overlapping matches with a regexp?“中的先行技巧对您的情况不起作用。

在一般情况下，我能想到的唯一可行的方法是生成字符串的所有可能的子字符串，并根据正则表达式的锚定版本检查每个子字符串。这是蛮力，但它是有效的。

Ruby：

def all_matches(str, regex)
  (n = str.length).times.reduce([]) do |subs, i|
     subs += [*i..n].map { |j| str[i,j-i] }
  end.uniq.grep /^#{regex}$/
end

all_matches("abcadc", /a.*c/) 
#=> ["abc", "abcadc", "adc"]

Javascript：

function allMatches(str, regex) {
  var i, j, len = str.length, subs={};
  var anchored = new RegExp('^' + regex.source + '$');
  for (i=0; i<len; ++i) {
    for (j=i; j<=len; ++j) {
       subs[str.slice(i,j)] = true;
    }
  }
  return Object.keys(subs).filter(function(s) { return s.match(anchored); });
}

票数 8

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/34964453

复制

相似问题

问如何获取字符串中可能重叠的匹配项
EN

回答 7

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何获取字符串中可能重叠的匹配项EN

回答 7

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何获取字符串中可能重叠的匹配项
EN