每8个单词后拆分字符串。如果第8个单词没有(.或者!),移动到下一个可以这样做的单词。
我可以把单词和字符串分开。
with open("file.txt") as c:
for line in c:
text = line.split()
n = 8
listword = [' '.join(text[i:i+n]) for i in range(0,len(text),n)]
for lsb in listword:
print(lsb)
预期输出应为
I'm going to the mall for breakfast, Please meet me there for lunch.
The duration of the next. He figured I was only joking!
I brought back the time.
这就是我所得到的
I'm going to the mall for breakfast, Please
meet me there for lunch. The duration of
the next. He figured I was only joking!
I brought back the time.
发布于 2019-05-21 03:19:07
您正在向单词序列添加换行符。换行符的主要条件是最后一个单词以.
或!
结尾。加上关于最小长度(8个单词或更多)的次要条件。下面的代码收集缓冲区中的单词,直到满足打印一行的条件。
with open("file.txt") as c:
out = []
for line in c:
for word in line.split():
out.append(word)
if word.endswith(('.', '!')) and len(out) >= 8:
print(' '.join(out))
out.clear()
# don't forget to flush the buffer
if out:
print(' '.join(out))
发布于 2019-05-21 03:29:27
看起来您并没有告诉您的代码查找.
或!
,只是为了将文本分成8个单词的块。这里有一个解决方案:
buffer = []
output = []
with open("file.txt") as c:
for word in c.split(" "):
buffer.append(word)
if '!' in word or '.' in word and len(buffer) > 7:
output.append(' '.join(buffer))
buffer = []
print output
它接收一个单词列表,在空格中拆分。它会将word
s添加到buffer
中,直到满足您的条件(word
包含标点符号并且缓冲区长度超过7个字)。然后,它将该buffer
附加到您的output
并清除该buffer
。
我不知道您的文件是如何构造的,所以我使用c
作为一长串句子进行了测试。您可能需要对输入进行一些修改,才能使其以此代码所期望的方式出现。
发布于 2019-05-21 03:18:49
我不确定如何使用理解列表来实现这一点,但您可以尝试使用常规的for循环来完成它。
with open("file.txt") as c:
for line in c:
text = line.split()
n = 8
temp = []
listword = []
for val in text:
if len(temp) < n or (not val.endswith('!') and not val.endswith('.')):
temp.append(val)
else:
temp.append(val)
listword.append(' '.join(temp))
temp = []
if temp: # if last line has less than 'n' words, it will append last line
listword.append(' '.join(temp))
for lsb in listword:
print(lsb)
https://stackoverflow.com/questions/56226602
复制相似问题