首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >python中带双括号的正则表达式

python中带双括号的正则表达式
EN

Stack Overflow用户
提问于 2020-03-19 07:12:37
回答 2查看 61关注 0票数 0

我正在尝试解析一些数据,这些数据的格式如下,称为data

代码语言:javascript
运行
复制
data = '(def-instance Adelphi
   (expenses thous$:7-10)
   (academic-emphasis biology))
(def-instance Arizona-State
   (expenses thous$:4-7)
   (academic-emphasis fine-arts))' 

我想将数据分成一个列表,第一段在第一个条目中,第二段在第二个条目中,即:

代码语言:javascript
运行
复制
['(def-instance Adelphi
   (expenses thous$:7-10)
   (academic-emphasis business-administration)
   (academic-emphasis biology))', 
'(def-instance Arizona-State
   (expenses thous$:4-7)
   (academic-emphasis fine-arts)']

我试着使用命令re.split(r'\(*(\([^()]*\)*)*\)',data),但是我有点不对劲,我不明白为什么。如果能帮上忙,我们将不胜感激。

EN

回答 2

Stack Overflow用户

发布于 2020-03-19 07:48:01

您可以通过迭代数据、搜索))并根据找到的索引和值创建结果列表来实现这一点。

代码语言:javascript
运行
复制
data = data.split('\n')

result = list()
prev = 0

for idx, value in enumerate(data):
    if '))' in value:
        result.append('\n'.join(data[prev:idx + 1]))
        prev = idx + 1

这将输出以下内容:

代码语言:javascript
运行
复制
print(result)
#['(def-instance Adelphi\n   (state newyork)\n   (control private)\n   (no-of-students thous:5-10)\n   (male:female ratio:30:70)\n   (student:faculty ratio:15:1)\n   (sat verbal 500)\n   (sat math 475)\n   (expenses thous$:7-10)\n   (percent-financial-aid 60)\n   (no-applicants thous:4-7)\n   (percent-admittance 70)\n   (percent-enrolled 40)\n   (academics scale:1-5 2)\n   (social scale:1-5 2)\n   (quality-of-life scale:1-5 2)\n   (academic-emphasis business-administration)\n   (academic-emphasis biology))', '(def-instance Arizona-State\n   (state arizona)\n   (control state)\n   (no-of-students thous:20+)\n   (male:female ratio:50:50)\n   (student:faculty ratio:20:1)\n   (sat verbal 450)\n   (sat math 500)\n   (expenses thous$:4-7)\n   (percent-financial-aid 50)\n   (no-applicants thous:17+)\n   (percent-admittance 80)\n   (percent-enrolled 60)\n   (academics scale:1-5 3)\n   (social scale:1-5 4)\n   (quality-of-life scale:1-5 5)\n   (academic-emphasis business-education)\n   (academic-emphasis engineering)\n   (academic-emphasis accounting)\n   (academic-emphasis fine-arts))']

在更新后的数据集上:

代码语言:javascript
运行
复制
result
#['(def-instance Adelphi\n   (expenses thous$:7-10)\n   (academic-emphasis biology))', '(def-instance Arizona-State\n   (expenses thous$:4-7)\n   (academic-emphasis fine-arts))']
票数 0
EN

Stack Overflow用户

发布于 2020-03-19 07:52:42

拆分位置的一个共同点是,它们都以)结束前一个'set‘,有一个换行符,然后下一个'set’以((开始。这说明了使用后视和前视的方法:

代码语言:javascript
运行
复制
import re

data = '''(def-instance Adelphi
   (expenses thous$:7-10)
   (academic-emphasis biology))
(def-instance Arizona-State
   (expenses thous$:4-7)
   (academic-emphasis fine-arts))'''

l = list(re.split(r'(?<=\)\))\s+(?=\()', data))
for item in l:
    print (item)
    print ()

输出(为清楚起见,在单独的行中打印):

代码语言:javascript
运行
复制
(def-instance Adelphi
   (expenses thous$:7-10)
   (academic-emphasis biology))

(def-instance Arizona-State
   (expenses thous$:4-7)
   (academic-emphasis fine-arts))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60748759

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档