文章/答案/技术大牛

发布

社区首页 >问答首页 >熊猫:如何为列中的每个子字符串复制值

问熊猫:如何为列中的每个子字符串复制值
EN

Stack Overflow用户

提问于 2022-10-12 13:26:10

回答 2查看 34关注 0票数 3

我有一只熊猫，

import pandas as pd

df = pd.DataFrame({'text': ['set an alarm for [time : two hours from now]','wake me up at [time : nine am] on [date : friday]','check email from [person : john]']})
print(df)

原始数据

                                                text
0       set an alarm for [time : two hours from now]
1  wake me up at [time : nine am] on [date : friday]
2                   check email from [person : john]

如果列表中的值超过一个，我想对列表中的所有值重复列表和标签(日期、时间和个人)。所以想要的输出是，

期望产出：

                                                new_text                                
0       set an alarm for [time : two] [time : hours] [time : from] [time : now]        
1  wake me up at [time : nine] [time : am] on [date : friday]  
2                   check email from [person : john]

到目前为止，我已经尝试将列表与原始列分开，但不知道如何继续。

df['separated_list'] = df.text.str.split(r"\s(?![^[]*])|[|]").apply(lambda x: [y for y in x if '[' in y])

python

pandas

string

list

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-10-12 13:32:31

您可以使用带有自定义函数的regex替换：

df['new_text'] = df.text.str.replace(
  r"\[([^\[\]]*?)\s*:\s*([^\[\]]*)\]",
  lambda m: ' '.join([f'[{m.group(1)} : {x}]'
                      for x in m.group(2).split()]), # new chunk for each word
  regex=True)

产出：

                                                text                                                                 new_text
0       set an alarm for [time : two hours from now]  set an alarm for [time : two] [time : hours] [time : from] [time : now]
1  wake me up at [time : nine am] on [date : friday]               wake me up at [time : nine] [time : am] on [date : friday]
2                   check email from [person : john]                                         check email from [person : john]

regex演示

票数 2

Stack Overflow用户

发布于 2022-10-12 17:13:40

使用后面和前面查找[]，使用重复捕获组获取字符串内容，然后使用以下方法拆分内容：

df = pd.DataFrame({'text': ['set an alarm for [time : two hours from now]','wake me up at [time : nine am] on [date : friday]','check email from [person : john]']})
#print(df)
data=df['text']
for item in data:
    print(item)
    matches=re.findall(r'(?<=\[)(?:[\w+\s*]+\:[\w+\s*]+)(?=\])', item)
    for match in matches:
        parts=match.split(":")
        print(parts)

产出：

set an alarm for [time : two hours from now]
['time ', ' two hours from now']
wake me up at [time : nine am] on [date : friday]
['time ', ' nine am']
['date ', ' friday']
check email from [person : john]
['person ', ' john']

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74042649

复制

相似问题

问熊猫:如何为列中的每个子字符串复制值
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫:如何为列中的每个子字符串复制值EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫:如何为列中的每个子字符串复制值
EN