我想让我的程序获得亚种名称(前)。‘'Ablepharus bivittatus')并将其存储为字符串键。然后,我希望程序采取以下的序列ID(整数)行,直到下一个亚种标题。整数将存储为原来在上面抓取的亚种键的值。
我希望程序能够提示用户输入字符串,然后通过所有字典键搜索输入,以找到完全匹配(大小写敏感和拼写问题),然后返回序列ID。
做这件事最有效的方法是什么?现在,我可以将这两个实体(ID和亚种名称)分开,但是我不知道如何在迭代文本文件时创建一个字典来存储这些值。
有些行包含相同的名称,但多次重复,我如何能够告诉程序检测到这一点,并且只匹配第一个相同的亚种名称作为一个字符串键?
文本文件具有以下格式
耽误您时间,实在对不起
Ablepharus bivittatus
36630
31764
31212
01996
09953
03744
14036
16094
01875
19076
09496
20583
24160
23142
26892
06533
05488
Ablepharus chernovi Ablepharus chernovi chernovi DAREVSKY 1953
Ablepharus chernovi eiselti SCHMIDTLER 1997
Ablepharus chernovi isauriensis SCHMIDTLER 1997
Ablepharus chernovi ressli SCHMIDTLER 1997
31212
01996
09637
14036
20583
23142
21989
26892
28697
09207
09206
Ablepharus darvazi
06245
26892到目前为止,我一直在处理这方面的一些代码。
dictionary = {}
with open("repCleanSubs2.txt") as file:
for line in file:
(key, val) = line.split()
dictionary[val(key)] = val
print key(1)
'''import re
file = open('repCleanSubs2.txt')
subspecies = []
dnaIDs = []
for line in file:
match = re.findall('^[a-zA-Z]+', line)
if match:
subspecies.append(line)
#Grab sequence IDs under this line ^
#
#Until you reach next string match
print dnaIDs
#userInput = raw_input("Which subspecies would you like to view?: ")
#if userInput == re.match(subspecies(line)):
# print subspecies(line)'''
# print sequences IDs from the line grabbed here ^`发布于 2016-04-30 06:36:31
file.read().splitlines()获取行的列表。这似乎符合你的要求:
import re
data = {}
lines = []
with open("data.txt") as f:
lines = f.read().splitlines()
name = ""
for l in lines:
if re.match("\d{5}", l):
data[name].append(l)
else:
name = l.strip()
data[name] = []
print data其结果如下:
{
"Ablepharus chernovi isauriensis SCHMIDTLER 1997": [],
"Ablepharus bivittatus": [
"36630",
"31764",
"31212",
"01996",
"09953",
"03744",
"14036",
"16094",
"01875",
"19076",
"09496",
"20583",
"24160",
"23142",
"26892",
"06533",
"05488"
],
"Ablepharus chernovi ressli SCHMIDTLER 1997": [
"31212",
"01996",
"09637",
"14036",
"20583",
"23142",
"21989",
"26892",
"28697",
"09207",
"09206"
],
"Ablepharus darvazi": [
"06245",
"26892"
],
"Ablepharus chernovi eiselti SCHMIDTLER 1997": [],
"Ablepharus chernovi Ablepharus chernovi chernovi DAREVSKY 1953": []
}我不确定您所说的某些行包含相同的名称重复,如果您可以详细说明这一点,并指出您的预期输出,然后可以合并。
最后,返回用户提供的给定密钥的序列ID如下所示:
print(data[raw_input()])https://stackoverflow.com/questions/36951031
复制相似问题