我有一个文件,其中一个模式是重复每4行。我想看看文件,如果第二行(在4行代码块内)超过2000个字符,我想将4行代码块写到输出文件中。
首先,我应该说,我是一个生物学家,而不是CS,所以我对编程比较陌生。我正在尝试使用enumerate来计算我所在的行,并且enumerate是一个可迭代的,所以我相信我可以在它上面调用下一个函数。然而,当我运行下面的代码块时,我最终打印了组中的所有四行,而理论上,我应该只打印第三行和第四行。但是,当我运行这段代码时,我最终打印了所有行。这就是令人困惑的部分。
with open('file', 'r') as f:
for i, line in enumerate(f, 1):
if i % 4 == 1:
first_line = line
if i % 4 == 2:
if len(line.strip()) > 2000:
seq_line = line
third_line = next(f)
fourth_line = next(f)
print(third_line)
print(fourth_line)
else:
pass
即使我试着做一些简单的事情,比如:
with open('file', 'r') as f:
for i, line in enumerate(f, 1):
if i % 4 == 1:
first_line = line
if i % 4 == 2:
print(line)
print(next(f))
我最终得到了所有的行,我仍然不明白。
谢谢。
发布于 2019-06-19 04:51:53
我根本不会费心使用enumerate
或next
。
with open('file', 'r') as f:
# keep going until we exhaust the file
while True:
# read the next four lines of the file
line1 = f.readline()
line2 = f.readline()
line3 = f.readline()
line4 = f.readline()
# if any of the lines are completely blank, the file is exhausted
if not line1 or not line2 or not line3 or not line4:
break
# if line2 is long enough, print the block
if len(line2) > 2000:
print (line1)
print (line2)
print (line3)
print (line4)
发布于 2019-06-19 05:07:00
使用re
模块查找4行组成的块(regex101):
import re
with open('file.txt', 'r') as f_in, \
open('file_out.txt', 'w') as f_out:
for g in re.finditer(r'([^\n]+(?:\n|\Z)){4}', f_in.read(), flags=re.DOTALL):
if len(g[0].splitlines()[1]) > 2000:
f_out.write(g[0])
发布于 2019-06-19 04:48:53
enumerate()
还返回一个迭代器。您可以将其赋给一个变量,然后使用next()
递增该变量,而不是递增文件迭代器。这样,索引i
将适当地递增。
with open('file', 'r') as f:
lines = enumerate(f, 1)
for i, line in lines:
if i % 4 == 1:
first_line = line
elif i % 4 == 2:
if len(line.strip()) > 2000:
seq_line = line
_, third_line = next(lines)
_, fourth_line = next(lines)
print(third_line)
print(fourth_line)
else:
pass
https://stackoverflow.com/questions/56656705
复制相似问题