首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >Python:重建嵌套列表

Python:重建嵌套列表
EN

Stack Overflow用户
提问于 2018-10-18 03:17:23
回答 1查看 75关注 0票数 0

我有一个嵌套的列表:

代码语言:javascript
复制
  output= [('the', 'B', 'NNP'), ('wall', 'I', 'NNP'), ('street', 'I', 'NNP'), ('journal', 'I', 'NNP'), ('reported', 'O', 'VB'), ('today', 'O', 'NNP'), ('that', 'O', 'NNP'), ('apple', 'B', 'NNP'), ('corporation', 'I', 'NNP'), ('made', 'O', 'VB'), ('money', 'O', 'NNP'), ('.', 'O', '.'), ('georgia', 'B', 'NNP'), ('tech', 'I', 'NNP'), ('is', 'O', 'NNP'), ('a', 'O', '.'), ('university', 'O', 'NNP'), ('in', 'O', 'NNP'), ('georgia', 'B', 'NNP'),('.', 'O', '.')]

我想将其重新格式化为以下预期格式:

代码语言:javascript
复制
new_output= [(['the', 'wall', 'street', 'journal', 'reported', 'today', 'that', 'apple', 'corporation', 'made', 'money'], ['B', 'I', 'I', 'I', 'O', 'O', 'O', 'B', 'I', 'O', 'O']), (['georgia', 'tech', 'is', 'a', 'university', 'in', 'georgia'], ['B', 'I', 'O', 'O', 'O', 'O', 'B'])]

我的尝试是:

代码语言:javascript
复制
import string
word = []
token = []
result_word = []
result_token = []

result = []
for i in output[0]:
    for every_word in i:
        word.append(every_word)
result_word = " ".join(" ".join(word).split()[::3])

怎样才能得到我想要的格式?

EN

回答 1

Stack Overflow用户

发布于 2018-10-18 03:37:16

您可以使用groupby将非句点项分组为句子,然后使用zip从词性指示器中拆分单词:

代码语言:javascript
复制
from itertools import groupby

l =   output= [('the', 'B', 'NNP'), ('wall', 'I', 'NNP'), ('street', 'I', 'NNP'), ('journal', 'I', 'NNP'), ('reported', 'O', 'VB'), ('today', 'O', 'NNP'), ('that', 'O', 'NNP'), ('apple', 'B', 'NNP'), ('corporation', 'I', 'NNP'), ('made', 'O', 'VB'), ('money', 'O', 'NNP'), ('.', 'O', '.'), ('georgia', 'B', 'NNP'), ('tech', 'I', 'NNP'), ('is', 'O', 'NNP'), ('a', 'O', '.'), ('university', 'O', 'NNP'), ('in', 'O', 'NNP'), ('georgia', 'B', 'NNP'),('.', 'O', '.')]


groups = (g for k, g in groupby(l, lambda x: x[0] != '.') if k)
zs = (zip(*g) for g in groups)
res = [(next(z), next(z)) for z in zs]

然后是res

代码语言:javascript
复制
[(('the', 'wall', 'street', 'journal', 'reported', 'today', 'that', 'apple', 'corporation', 'made', 'money'), 
  ('B', 'I', 'I', 'I', 'O', 'O', 'O', 'B', 'I', 'O', 'O')), 
 (('georgia', 'tech', 'is', 'a', 'university', 'in', 'georgia'), 
  ('B', 'I', 'O', 'O', 'O', 'O', 'B'))
]
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/52862121

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档