我想知道某个物种在NCBI的分类系统里被归为哪个目、哪个科、哪个属? 单个物种可以手动NCBI网站检索,如果物种数非常多如何实现?
之前读 ete3 的帮助文档的时候看到过类似的功能http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html。最近可能会用到这个功能,记录自己使用的代码
(首先是安装ete3:自己windows10电脑安装了Anaconda3,直接在DOS窗口下使用命令pip install ete3
即可安装)
from ete3 import NCBITaxa
ncbi = NCBITaxa
name2taxid = ncbi.get_name_translator(["Punica granatum"])
for a,b in name2taxid.items():
lineage = ncbi.get_lineage(b[0])
names = ncbi.get_taxid_translator(lineage)
for taxid in lineage:
print(names[taxid])
输出结果
root
cellular organisms
Eukaryota
Viridiplantae
Streptophyta
Streptophytina
Embryophyta
Tracheophyta
Euphyllophyta
Spermatophyta
Magnoliophyta
Mesangiospermae
eudicotyledons
Gunneridae
Pentapetalae
rosids
malvids
Myrtales
Lythraceae
Punica
Punica granatum
Lumnitzera littorea
Punica granatum
Heimia myrtifolia
Sonneratia alba
Epilobium ulleungensis
代码
import sys
from ete3 import NCBITaxa
input_file = sys.argv[1]
output_file = sys.argv[2]
ncbi = NCBITaxa()
fw = open(output_file,"w")
with open(input_file,"r") as fr:
for line in fr:
species_name = line.strip()
name2taxid = ncbi.get_name_translator([species_name])
for a,b in name2taxid.items():
lineage = ncbi.get_lineage(b[0])
names = ncbi.get_taxid_translator(lineage)
i = 1
for taxid in lineage:
if i < len(lineage):
fw.write(names[taxid]+",")
i = i + 1
else:
fw.write(names[taxid]+"\n")
print(species_name + ":","OK")
fw.close()
#使用方法
python .\get_species_placement_in_NCBI.py .\Organism_name.txt placement.txt
#输出结果
root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Combretaceae,Lumnitzera,Lumnitzera littorea
root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Lythraceae,Punica,Punica granatum
root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Lythraceae,Heimia,Heimia myrtifolia
root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Lythraceae,Sonneratia,Sonneratia alba
root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Onagraceae,Onagroideae,Epilobieae,Epilobium,Epilobium ulleungensis