我想为文本文件中相同的连续行添加一个标识符。例如,我有以下输入文件:
Apple
Apple
Apple
Banana
Banana
Pineapple
Pineapple
Pineapple
Pineapple
我希望我的输出是这样的:
Apple_number_1
Apple_number_2
Apple_number_3
Banana_number_1
Banana_number_2
Pineapple_number_1
Pineapple_number_2
Pineapple_number_3
Pineapple_number_4
我有一段代码,如果当前行和前一行相同,就会打印一行:
my_file=open('/Users/Jo/Desktop/for_building.txt')
lines=my_file.readlines()
def lines_equal(curr_line, prev_line, compare_char):
curr_line_parts = curr_line.split(' ')
prev_line_parts = prev_line.split(' ')
for item in zip(curr_line_parts, prev_line_parts):
if item[0].startswith(compare_char):
return item[0] == item[1]
results = []
prev_line = lines[0]
for line in lines[1:]:
results.append(lines_equal(line, prev_line, 'Z'))
prev_line = line
print(prev_line)
如何在末尾添加标识符?我想我将使用一个while
循环。如果while循环在for
循环中被捕获,就会变得很棘手。有什么聪明的办法可以解决这个问题吗?
发布于 2019-07-11 02:58:25
我会使用一个默认的dict,它会保存每一行的计数,从零(默认)开始,并在每次对同一行进行编码时递增:
from collections import defaultdict
lineCounts = defaultdict(int)
for line in lines:
lineCounts[line] = lineCounts[line] + 1
print('{}_Number_{}'.format(line, lineCounts[line])
发布于 2019-07-11 03:45:29
from itertools import groupby
with open("data.txt", "r") as file:
lines = file.read().splitlines()
groups = [list(group) for _, group in groupby(lines)]
for group in groups:
for index, fruit in enumerate(group, start=1):
print(f"{fruit}_number_{index}")
输出:
Apple_number_1
Apple_number_2
Apple_number_3
Banana_number_1
Banana_number_2
Pineapple_number_1
Pineapple_number_2
Pineapple_number_3
Pineapple_number_4
发布于 2019-07-11 03:06:49
简单的迭代方法:
with open('file.txt') as f:
cnt = 1 # initial counter value
prev_line = None
for line in f:
if prev_line and line != prev_line: cnt = 1 # resetting counter
print('{}_number_{}'.format(line.strip(), cnt))
prev_line = line
cnt += 1
输出:
Apple_number_1
Apple_number_2
Apple_number_3
Banana_number_1
Banana_number_2
Pineapple_number_1
Pineapple_number_2
Pineapple_number_3
Pineapple_number_4
https://stackoverflow.com/questions/56976962
复制相似问题