我们有一个字符串,如:
s = u'apple banana lemmon (hahaha) dog cat whale (hehehe) red blue black'我想创建以下列表:
including = ['hahaha', 'hehehe']
excluding = ['apple banana lemmon (', ') dog cat whale (', ') red blue black']第一个列表是直接使用regex的:
including = re.findall('\((.*?)\)',s)但我找不到其他清单上类似的东西。你能帮我一下吗?提前谢谢你!!
发布于 2018-08-22 15:10:51
excluding = re.split('|'.join(including), s)对于一个简单的情况,您知道控制信息将不包含特殊字符或regex定义。
如果您不确定是否会发生这种情况:
re.split('|'.join(map(re.escape, including)), s)这将转义特殊的regex字符,否则这些字符将导致re.split函数的功能失调。
发布于 2018-08-22 15:25:02
用RegEx
a = re.findall('\)?[^()]*\(?', s)
excluded = a[::2]
included = a[1::2]
print(included, excluded, sep='\n')
['hahaha', 'hehehe', '']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']处理空字符串
a = re.findall('\)?[^()]*\(?', s)
excluded = [*filter(bool, a[::2])]
included = [*filter(bool, a[1::2])]
print(included, excluded, sep='\n')
['hahaha', 'hehehe']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']无RegEx
from itertools import cycle
def f(s):
c = cycle('()')
a = {'(': 1, ')': 0}
while s:
d = next(c)
i = s.find(d)
if i > -1:
j = a[d]
yield d, s[:i + j]
s = s[i + j:]
else:
yield d, s
break
included = []
excluded = []
for k, v in f(s):
if k == '(':
excluded.append(v)
else:
included.append(v)
print(included, excluded, sep='\n')
['hahaha', 'hehehe']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']相同的思想而不覆盖s
from itertools import cycle
def f(s):
c = cycle('()')
a = {'(': 1, ')': 0}
j = 0
while True:
d = next(c)
i = s.find(d, j)
if i > -1:
k = a[d]
yield d, s[j:i + k]
j = i + k
else:
yield d, s[j:]
break
included = []
excluded = []
for k, v in f(s):
if k == '(':
excluded.append(v)
else:
included.append(v)
print(included, excluded, sep='\n')
['hahaha', 'hehehe']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']发布于 2018-08-22 15:22:51
您可以使用正向后视和正向前瞻来分隔括号中的单词:
>>> re.split(r'(?<=\().*?(?=\))', s)
['apple banana lemmon (', ') dog cat whale (', ') red blue black']https://stackoverflow.com/questions/51969716
复制相似问题