文章/答案/技术大牛

发布

社区首页 >问答首页 >除了字符串在python中的引号外，如何将字符串拆分成字符串？

问除了字符串在python中的引号外，如何将字符串拆分成字符串？
EN

Stack Overflow用户

提问于 2015-12-23 21:54:07

回答 6查看 1.1K关注 0票数 4

我想把下面的字符串拆分成单词'and‘，除非单词'and’在引号内

string = "section_category_name = 'computer and equipment expense' and date >= 2015-01-01 and date <= 2015-03-31"

期望结果

["section_category_name = 'computer and equipment expense'","date >= 2015-01-01","date <= 2015-03-31"]

我似乎找不到正确的正则表达式模式，它正确地拆分了字符串，因此“计算机和设备费用”不会被分割。

以下是我尝试过的：

re.split('and',string)

结果

[" section_category_name = 'computer "," equipment expense' ",' date >= 2015-01-01 ',' date <= 2015-03-31']

如您所见，结果将“计算机和设备费用”拆分为列表中的不同项目。

我还在这个问题上尝试了以下几种方法：

r = re.compile('(?! )[^[]+?(?= *\[)'
               '|'
               '\[.+?\]')
r.findall(s)

结果：

[]

我还从这个问题中尝试了以下内容

result = re.split(r"and+(?=[^()]*(?:\(|$))", string)

结果：

[" section_category_name = 'computer ",
 " equipment expense' ",
 ' date >= 2015-01-01 ',
 ' date <= 2015-03-31']

目前的挑战是，有关此主题的先前问题并不涉及如何用引号中的一个单词拆分字符串，因为它们解决了如何用特殊字符或空格拆分字符串。

如果我将字符串修改为以下内容，我就能够得到所需的结果

string = " section_category_name = (computer and equipment expense) and date >= 2015-01-01 and date <= 2015-03-31"
result = re.split(r"and+(?=[^()]*(?:\(|$))", string)

期望结果

[' section_category_name = (computer and equipment expense) ',
 ' date >= 2015-01-01 ',
 ' date <= 2015-03-31']

但是，我需要函数在撇号中不拆分‘和’，而不是括号。

python

regex

string

Stack Overflow用户

发布于 2015-12-23 23:33:26

下面的代码将起作用，并且不需要疯狂的正则表达式来实现它。

import re

# We create a "lexer" using regex. This will match strings surrounded by single quotes,
# words without any whitespace in them, and the end of the string. We then use finditer()
# to grab all non-overlapping tokens.
lexer = re.compile(r"'[^']*'|[^ ]+|$")

string = "section_category_name = 'computer and equipment expense' and date >= 2015-01-01 and date <= 2015-03-31"

results = []
buff = []

# Iterate through all the tokens our lexer identified and parse accordingly
for match in lexer.finditer(string):
    token = match.group(0) # group 0 is the entire matching string

    if token in ('and', ''):
        # Once we reach 'and' or the end of the string '' (matched by $)
        # We join all previous tokens with a space and add to our results.
        results.append(' '.join(buff))
        buff = [] # Reset for the next set of tokens
    else:
        buff.append(token)

print results

演示

编辑：这里有一个更简洁的版本，有效地将上面语句中的for循环替换为itertools.groupby。

import re
from itertools import groupby

string = "section_category_name = 'computer and equipment expense' and date >= 2015-01-01 and date <= 2015-03-31"

lexer = re.compile(r"'[^']*'|[^\s']+")
grouping = groupby(lexer.findall(string), lambda x: x == 'and')
results = [ ' '.join(g) for k, g in grouping if not k ]

print results

演示

票数 0

查看全部 6 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/34444319

复制

相似问题

问除了字符串在python中的引号外，如何将字符串拆分成字符串？
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问除了字符串在python中的引号外，如何将字符串拆分成字符串？EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问除了字符串在python中的引号外，如何将字符串拆分成字符串？
EN