首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >单独的文本块python

单独的文本块python
EN

Stack Overflow用户
提问于 2018-05-30 08:52:02
回答 3查看 1.3K关注 0票数 0

我想知道如何将同一文本文件中的文本块分开。示例如下所示。基本上我有两个项目,一个从"Channel 9“到"Brief:..”行,另一个以"Southern ...“开头。再说一遍,“简要”这句话。如何用python把它们分成两个文本文件呢?我估计最常见的分隔符应该是"(female 16+)“。非常感谢!

Channel 9 (1 item)

A woman selling her caravan near Bendigo has been left 
$1,100 out
hosted by Peter Hitchener
A woman selling her caravan near Bendigo has been left $1,100 out of 
pocket after an elderly couple made the purchase with counterfeit money. 
The wildlife worker tried to use the notes to pay for a house deposit, but an 
agent noticed the notes were missing the Coat of Arms on one side. 


Brief: Radio & TV
Demographics: 153,000 (male 16+) • 177,000 (female 
16+)

Southern Cross Victoria Bendigo (1 item)


Heathcote Police are warning the residents to be on the 
lookout a
hosted by Jo Hall
Heathcote Police are warning the residents to be on the lookout after a large 
dash of fake $50 note was discovered. Victim Marianne Thomas was given 
counterfeit notes from a caravan. The Heathcote resident tried to pay the 
house deposit and that's when the counterfeit notes were spotted. Thomas 
says the caravan is in town for the Spanish Festival.


Brief: Radio & TV
Demographics: 4,000 (male 16+) • 3,000 (female 16+)
EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2018-05-30 09:24:25

将使用下一节的第一行作为文件名。

#!/usr/bin/env python
import re

data = """
Channel 9 (1 item)

A woman selling her caravan near Bendigo has been left $1,100 out hosted by
Peter Hitchener A woman selling her caravan near Bendigo has been left $1,100
out of pocket after an elderly couple made the purchase with counterfeit money.
The wildlife worker tried to use the notes to pay for a house deposit, but an
agent noticed the notes were missing the Coat of Arms on one side.

Brief: Radio & TV Demographics: 153,000 (male 16+) • 177,000 (female 16+)

Southern Cross Victoria Bendigo (1 item)

Heathcote Police are warning the residents to be on the lookout a hosted by Jo
Hall Heathcote Police are warning the residents to be on the lookout after a
large dash of fake $50 note was discovered. Victim Marianne Thomas was given
counterfeit notes from a caravan. The Heathcote resident tried to pay the house
deposit and that's when the counterfeit notes were spotted. Thomas says the
caravan is in town for the Spanish Festival.

Brief: Radio & TV Demographics: 4,000 (male 16+) • 3,000 (female 16+)
"""



current_file = None
for line in data.split('\n'):

    # Set initial filename
    if current_file == None and line != '':
        current_file = line + '.txt'

    # This is to handle the blank line after Brief
    if current_file == None:
        continue

    text_file = open(current_file, "a")
    text_file.write(line + "\n")
    text_file.close()

    # Reset filename if we have finished this section
    # which is idenfitied by:
    #    starts with Brief - ^Brief
    #    contains some random amount of text - .*
    #    ends with ) - )$
    if re.match(r'^Brief:.*\)$', line) is not None:
        current_file = None

这将输出以下文件

Channel 9 (1 item).txt
Southern Cross Victoria Bendigo (1 item).txt
票数 2
EN

Stack Overflow用户

发布于 2018-05-30 09:02:08

这里有一些硬编码可以做到这一点:

s = """Channel 9 (1 item)

A woman selling her caravan near Bendigo has been left $1,100 out hosted by Peter Hitchener A woman selling her caravan near Bendigo has been left $1,100 out of pocket after an elderly couple made the purchase with counterfeit money. The wildlife worker tried to use the notes to pay for a house deposit, but an agent noticed the notes were missing the Coat of Arms on one side.

Brief: Radio & TV Demographics: 153,000 (male 16+) • 177,000 (female 16+)

Southern Cross Victoria Bendigo (1 item)

Heathcote Police are warning the residents to be on the lookout a hosted by Jo Hall Heathcote Police are warning the residents to be on the lookout after a large dash of fake $50 note was discovered. Victim Marianne Thomas was given counterfeit notes from a caravan. The Heathcote resident tried to pay the house deposit and that's when the counterfeit notes were spotted. Thomas says the caravan is in town for the Spanish Festival.

Brief: Radio & TV Demographics: 4,000 (male 16+) • 3,000 (female 16+)"""

part_1 = s[s.index("Channel 9"):s.index("Southern Cross")]

part_2 = s[s.index("Southern Cross"):]

然后将它们保存到文件中。

票数 1
EN

Stack Overflow用户

发布于 2018-05-30 09:02:45

看起来像是以“人口统计:”开头的行,就像是真正的分隔符。我将以两种方式使用正则表达式:第一,按这些行分割文本;第二,提取这些行本身。然后,可以组合结果以重建块:

import re
DIVIDER = 'Demographics: .+' # Make it tunable, in case you change your mind
blocks_1 = re.split(DIVIDER, text)
blocks_2 = re.findall(DIVIDER, text)
blocks = ['\n\n'.join(pair) for pair in zip(blocks_1, blocks_2)
blocks[0]
#Channel 9 (1 item)\n\nA woman selling her caravan near ... 
#... Demographics: 153,000 (male 16+) • 177,000 (female 16+)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50594925

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档