我想创建一个python脚本,匹配输入文件(文件1)的前2列。
10000D 10000R
10003D 10003R
并将这两列与保存数据集的另一输入文件(文件2)中的列2和4进行匹配。
0 10000D 0 10000R 0.05
0 10001D 0 10001D 0.06
0 10003D 0 10003R 0.09
一旦这些列匹配,我想打印出文件1中与文件2匹配的列保存在新输出文件中的行。输出文件应如下所示:
0 10000D 0 10000R 0.05
0 10003D 0 10003R 0.09
我的代码如下所示:
#Python code for pi-hats extraction
#!/usr/bin/python
#open and read file to read from (F1), file to match to (F2), File to write and save to (F3)
F1 = open("File_1", "r") #File_1 is original file, has 2 columns
F2 = open("File_2", "r") #where dataset is kept
F3 = open("File_3", "w") #where matches are stored
for match1 in sorted(F1):
if match1 in F2:
F3.write(match)
F3.close()
exit
然而,当我运行这段代码时,我没有得到任何匹配。
有什么建议吗?
谢谢,
DM
更新:原始文件2看起来像这样:
0 10000_D 0 10000_R AB 0 1.2345 0.1234 0.0000 0.0000 -1 0.765432 0.05 1.2345
0 10001_D 0 10001_R AB 0 1.2345 0.1234 0.0000 0.0000 -1 0.876543 0.06 1.3456
0 10003_D 0 10003_R AB 0 1.2345 0.1234 0.0000 0.0000 -1 0.987654 0.09 1.4567
也许间距与此有关?我想当我把它放到excel中时,格式可能已经改变了。
发布于 2018-06-27 01:12:00
import csv
with open("File_1", "r") as F1: #File_1 is original file, has 2 columns
# split the file using a space as delimiter and read it to the memory:
F1_d = sorted(csv.reader(F1, delimiter=' '))
with open("File_2", "r") as F2: #where dataset is kept
# split the file using space again, and read it to a dictionary
# structure indexed by second and forth columns:
F2_d = {(row[1], row[3]): row for row in csv.reader(F2, delimiter=' ')}
with open("File_3", "w") as F3: #where matches are stored
for match1 in F1_d:
if tuple(match1) in F2_d: # search for a match using the index defined
F3.write(' '.join(F2_d[tuple(match1)]) + '\n')
https://stackoverflow.com/questions/51048071
复制相似问题