我使用以下代码从DNA序列的fasta文件中更改fasta名称。我将序列数设置为原始fasta文件中的完整序列数,但输出总是较少。换句话说,如果我的原始fasta文件包含50个序列,那么尽管我将序列的数量设置为50个,但名称更改后的fasta文件将只有49个序列。原始文件中有100个序列,结果文件最终只有98个序列。我错过了什么。
from itertools import islice
infile = mydatadirpath + "ExportFastaFile.fasta"
records = SeqIO.parse(infile, "fasta")
FileToExportShortNamesTo = mydatadirpath + "ExportShortNamesFastaFile.fasta"
g = open(FileToExportShortNamesTo,"w+")
randnumseqs = 50
counter = 0
for record in islice(records, randnumseqs):
Name = record.description
counter = counter + 1
Namer = ">" + str(Name)[0:1] + str(counter)
seqstring = str(record.seq)
g.write(Namer + "\n" + seqstring + "\n")我尝试将要切片的序列数增加1,认为这可能是一个索引问题,但这不会改变任何事情。我做错了什么?
示例输入如下所示,但包含50条记录,而不是此处显示的10条记录:
>EAAA1
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTGTACTACTTGAGTTGTTTAATCATTCCTTCCT
>EAAA2
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>EAAA3
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>EAAA4
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>EAAA5
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>EAAA6
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>EAAA7
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>EAAA8
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>EAAAE9
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>EAAA10
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT输出应该类似于50条记录,而不是所示的10条:
>E1
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTGTACTACTTGAGTTGTTTAATCATTCCTTCCT
>E2
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>E3
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>E4
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>E5
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>E6
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>E7
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>E8
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>E9
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT
>E10
AGCAGGAGCAACGTACCCTTACCAATTTAGTACGTATTCTTTTACTACTTGAGTTGTTTAATCATTCCTTCCT发布于 2020-10-09 10:55:52
尝试将此Namer = ">" + str(Name)[0:1] + str(counter)切换到Namer = ">" + str(Name)[:1] + str(counter)
该0使其始终在第一个元素之后开始
发布于 2020-10-09 11:17:08
我想通了。我用作输入的文件是使用前一个单元格中的代码创建的。在创建该文件后,我没有关闭它,因此没有EOF,因此我发布的代码没有读取最终记录。我关闭了两个代码块之间的文件,那里有50条记录,问题解决了。
https://stackoverflow.com/questions/64273258
复制相似问题