基本上,我想将以下字符串拆分为两个单独的字符串,这样:
输入:利普修斯,A。滚出到3b (1-2 FBF);阿蒙斯前进到第二。莫伯格三振出局(2-2 BSSFBS)。
输出:‘利普修斯,A。接地到3b (1-2 FBF);阿蒙斯前进到第二。’,‘莫伯格三振出局(2-2 BSSFBS)。’
新的句子是我的案例总是以大写字母开头(即球员的名字)。以下是我尝试使用的代码来完成此操作:
import re
string = 'LIPCIUS, A. grounded out to 3b (1-2 FBF); AMMONS advanced to second. MOBERG struck out swinging (2-2 BSSFBS).'
x = re.findall("[A-Z].*?[\.!?]", string, re.DOTALL)
print(x)
我的代码当前输出以下内容,列表中的第一个字符串不准确:
['LIPCIUS, A.', 'FBF); AMMONS advanced to second.', 'MOBERG struck out swinging (2-2 BSSFBS).']
it should be ['LIPCIUS, A. grounded out to 3b (1-2 FBF); AMMONS advanced to second.','MOBERG struck out swinging (2-2 BSSFBS).']
发布于 2017-06-22 10:54:52
下面的正则表达式应该适用于您,添加了可选的大写字母lookahead assertion或end $
后跟.
,以避免在A.
和B.
处停止
import re
string = 'LIPCIUS, A. grounded out to 3b (1-2 FBF); AMMONS advanced to second. MOBERG struck out swinging (2-2 BSSFBS).'
x = re.findall("[A-Z].*?[\.!?]\s?(?=[A-Z]|$)", string, re.DOTALL)
# ['LIPCIUS, A. grounded out to 3b (1-2 FBF); AMMONS advanced to second. ', 'MOBERG struck out swinging (2-2 BSSFBS).']
发布于 2017-06-22 10:54:58
import re
s = 'LIPCIUS, A. grounded out to 3b (1-2 FBF); AMMONS advanced to second. MOBERG struck out swinging (2-2 BSSFBS).'
l = re.split(r'[.][ ](?=[A-Z]+\b)', s)
print l
它只是没有在每个想要的输出项后面包括点,但我想这不会困扰你。
https://stackoverflow.com/questions/44689193
复制相似问题