我想把下面的字符串拆分成单词'and‘,除非单词'and’在引号内
string = "section_category_name = 'computer and equipment expense' and date >= 2015-01-01 and date <= 2015-03-31"期望结果
["section_category_name = 'computer and equipment expense'","date >= 2015-01-01","date <= 2015-03-31"]我似乎找不到正确的正则表达式模式,它正确地拆分了字符串,因此“计算机和设备费用”不会被分割。
以下是我尝试过的:
re.split('and',string)结果
[" section_category_name = 'computer "," equipment expense' ",' date >= 2015-01-01 ',' date <= 2015-03-31']如您所见,结果将“计算机和设备费用”拆分为列表中的不同项目。
我还在这个问题上尝试了以下几种方法:
r = re.compile('(?! )[^[]+?(?= *\[)'
'|'
'\[.+?\]')
r.findall(s)结果:
[]我还从这个问题中尝试了以下内容
result = re.split(r"and+(?=[^()]*(?:\(|$))", string)结果:
[" section_category_name = 'computer ",
" equipment expense' ",
' date >= 2015-01-01 ',
' date <= 2015-03-31']目前的挑战是,有关此主题的先前问题并不涉及如何用引号中的一个单词拆分字符串,因为它们解决了如何用特殊字符或空格拆分字符串。
如果我将字符串修改为以下内容,我就能够得到所需的结果
string = " section_category_name = (computer and equipment expense) and date >= 2015-01-01 and date <= 2015-03-31"
result = re.split(r"and+(?=[^()]*(?:\(|$))", string)期望结果
[' section_category_name = (computer and equipment expense) ',
' date >= 2015-01-01 ',
' date <= 2015-03-31']但是,我需要函数在撇号中不拆分‘和’,而不是括号。
发布于 2015-12-23 23:33:26
下面的代码将起作用,并且不需要疯狂的正则表达式来实现它。
import re
# We create a "lexer" using regex. This will match strings surrounded by single quotes,
# words without any whitespace in them, and the end of the string. We then use finditer()
# to grab all non-overlapping tokens.
lexer = re.compile(r"'[^']*'|[^ ]+|$")
string = "section_category_name = 'computer and equipment expense' and date >= 2015-01-01 and date <= 2015-03-31"
results = []
buff = []
# Iterate through all the tokens our lexer identified and parse accordingly
for match in lexer.finditer(string):
token = match.group(0) # group 0 is the entire matching string
if token in ('and', ''):
# Once we reach 'and' or the end of the string '' (matched by $)
# We join all previous tokens with a space and add to our results.
results.append(' '.join(buff))
buff = [] # Reset for the next set of tokens
else:
buff.append(token)
print results演示
编辑:这里有一个更简洁的版本,有效地将上面语句中的for循环替换为itertools.groupby。
import re
from itertools import groupby
string = "section_category_name = 'computer and equipment expense' and date >= 2015-01-01 and date <= 2015-03-31"
lexer = re.compile(r"'[^']*'|[^\s']+")
grouping = groupby(lexer.findall(string), lambda x: x == 'and')
results = [ ' '.join(g) for k, g in grouping if not k ]
print results演示
https://stackoverflow.com/questions/34444319
复制相似问题