我有一个文件文本分隔的文件,我试图使每一行二进制组合,并给出每对的行数。
下面是一个示例(如果您想要https://gist.github.com/anonymous/4107418c63b88c6da44281a8ae7a321f,也可以在这里下载)
"A,B "
"AFD,DNGS,SGDH "
"NHYG,QHD,lkd,uyete"
"AFD,TTT" 我想要这样
A_1 B_1
AFD_2 DNGS_2
AFD_2 SGDH_2
DNGS_2 SGDH_2
NHYG_3 QHD_3
NHYG_3 lkd_3
NHYG_3 uyete_3
QHD_3 lkd_3
QHD_3 uyete_3
lkd_3 uyete_3
AFD_4 TTT_4这意味着,A_1和B_1来自第一行,AFD_2和DNGS_2来自第二行,等等。
我已经试过了,但我想不出来
#!/usr/bin/python
import itertools
# make my output
out = {}
# give a name to my data
file_name = 'data.txt'
# read all the lines
for n, line in enumerate(open(file_name).readlines()):
# split each line by comma
item1 = line.split('\t')
# split each stirg from another one by a comma
item2 = item1.split(',')
# iterate over all combinations of 2 strings
for i in itertools.combinations(item2,2):
# save the data into out
out.write('\t'.join(i))输出答案1
"A_1, B "_1
"AFD_2, DNGS_2
"AFD_2, SGDH "_2
DNGS_2, SGDH "_2
"NHYG_3, QHD_3
"NHYG_3, lkd_3
"NHYG_3, uyete"_3
QHD_3, lkd_3
QHD_3, uyete"_3
lkd_3, uyete"_3
"AFD_4, TTT"_4 答案2
"A_1 B "_1
"AFD_2 DNGS_2
"AFD_2 SGDH "_2
DNGS_2 SGDH "_2
"NHYG_3 QHD_3
"NHYG_3 lkd_3
"NHYG_3 uyete"_3
QHD_3 lkd_3
QHD_3 uyete"_3
lkd_3 uyete"_3
"AFD_4 TTT"_4发布于 2016-12-19 00:10:54
尝尝这个
#!/usr/bin/python
from itertools import combinations
with open('data1.txt') as f:
result = []
for n, line in enumerate(f, start=1):
items = line.strip().split(',')
x = [['%s_%d' % (x, n) for x in item] for item in combinations(items, 2)]
result.append(x)
for res in result:
for elem in res:
print(',\t'.join(elem))您需要一个列表列表来表示每一对。您可以使用循环中的列表理解来构建它们。
我不确定您想要什么样的实际输出格式,但这会打印出您的预期输出。
如果输入文件中有引号,那么简单的修复方法是
items = line.replace("\"", "").strip().split(',')上面的代码。如果数据中还有其他双引号,这种情况就会中断。所以如果你知道这是不对的。
否则,创建一个小函数来删除引号。此示例还将写入文件。
#!/usr/bin/python
from itertools import combinations
def remquotes(s):
beg, end = 0, len(s)
if s[0] == '"': beg = 1
if s[-1] == '"': end = -1
return s[beg:end]
with open('data1.txt') as f:
result = []
for n, line in enumerate(f, start=1):
items = remquotes(line.strip()).strip().split(',')
x = [['%s_%d' % (x, n) for x in item] for item in combinations(items, 2)]
result.append(x)
with open('out.txt', 'w') as fout:
for res in result:
for elem in res:
linestr = ',\t'.join(elem)
print(linestr)
fout.write(linestr + '\n')发布于 2016-12-19 00:13:21
类似于提供的另一个答案,并补充说,根据注释,它看起来实际上希望写入一个标签分隔的文本文件,而不是字典。
#!/usr/bin/python
import itertools
file_name = 'data.txt'
out_file = 'out.txt'
with open(file_name) as infile, open(out_file, "w") as out:
for n,line in enumerate(infile):
row = [i + "_" + str(n+1) for i in line.strip().split(",")]
for i in itertools.combinations(row,2):
out.write('\t'.join(i) + '\n')发布于 2016-12-19 01:28:31
下面的代码似乎可以使用最少的代码:
import itertools
input_filename = 'data.txt'
output_filename = 'split_data.txt'
with open(input_filename, 'rt') as inp, open(output_filename, 'wt') as outp:
for n, line in enumerate(inp, 1):
items = ('{}_{}'.format(x.strip(), n)
for x in line.replace('"', '').split(','))
for combo in itertools.combinations(items, 2):
outp.write('\t'.join(combo) + '\n')https://stackoverflow.com/questions/41213905
复制相似问题