我正在寻找一种方法,无论是在Ruby中还是在Javascript中,都可以给我一个字符串内的所有匹配,可能是重叠的,而不是regexp。
假设我有str = "abcadc"
,并且我想查找出现的a
,后面跟任意数量的字符,然后是c
。我要查找的结果是["abc", "adc", "abcadc"]
。你有什么办法让我做到这一点吗?
str.scan(/a.*c/)
会给我["abcadc"]
,str.scan(/(?=(a.*c))/).flatten
会给我["abcadc", "adc"]
发布于 2016-01-23 22:19:46
def matching_substrings(string, regex)
string.size.times.each_with_object([]) do |start_index, maching_substrings|
start_index.upto(string.size.pred) do |end_index|
substring = string[start_index..end_index]
maching_substrings.push(substring) if substring =~ /^#{regex}$/
end
end
end
matching_substrings('abcadc', /a.*c/) # => ["abc", "abcadc", "adc"]
matching_substrings('foobarfoo', /(\w+).*\1/)
# => ["foobarf",
# "foobarfo",
# "foobarfoo",
# "oo",
# "oobarfo",
# "oobarfoo",
# "obarfo",
# "obarfoo",
# "oo"]
matching_substrings('why is this downvoted?', /why.*/)
# => ["why",
# "why ",
# "why i",
# "why is",
# "why is ",
# "why is t",
# "why is th",
# "why is thi",
# "why is this",
# "why is this ",
# "why is this d",
# "why is this do",
# "why is this dow",
# "why is this down",
# "why is this downv",
# "why is this downvo",
# "why is this downvot",
# "why is this downvote",
# "why is this downvoted",
# "why is this downvoted?"]
发布于 2016-01-23 22:35:17
在Ruby中,您可以使用以下命令来实现预期的结果:
str = "abcadc"
[/(a[^c]*c)/, /(a.*c)/].flat_map{ |pattern| str.scan(pattern) }.reduce(:+)
# => ["abc", "adc", "abcadc"]
这种方式是否对你有效,很大程度上取决于你真正想要实现的目标。
我试着把它放在一个单独的表达式中,但我不能让它工作。我真的很想知道这是不是有什么科学原因不能被正则表达式解析,或者我只是对Ruby的解析器Oniguruma了解不够多,无法做到这一点。
发布于 2016-01-23 23:09:31
您需要所有可能的匹配,包括重叠的匹配。正如您已经注意到的,"How to find overlapping matches with a regexp?“中的先行技巧对您的情况不起作用。
在一般情况下,我能想到的唯一可行的方法是生成字符串的所有可能的子字符串,并根据正则表达式的锚定版本检查每个子字符串。这是蛮力,但它是有效的。
Ruby:
def all_matches(str, regex)
(n = str.length).times.reduce([]) do |subs, i|
subs += [*i..n].map { |j| str[i,j-i] }
end.uniq.grep /^#{regex}$/
end
all_matches("abcadc", /a.*c/)
#=> ["abc", "abcadc", "adc"]
Javascript:
function allMatches(str, regex) {
var i, j, len = str.length, subs={};
var anchored = new RegExp('^' + regex.source + '$');
for (i=0; i<len; ++i) {
for (j=i; j<=len; ++j) {
subs[str.slice(i,j)] = true;
}
}
return Object.keys(subs).filter(function(s) { return s.match(anchored); });
}
https://stackoverflow.com/questions/34964453
复制相似问题