下面有一个文件,我想将每四行写的内容转换成一个数字。
sample.fastq
@HISE
GGATCGCAATGGGTA
+
CC@!$%*&J#':AAA
@HISE
ATCGATCGATCGATA
+
()**D12EFHI@$;;第四行是一系列字符,每个字符分别等同于一个数字(存储在字典中)。我想把每个字符转换成相应的数字,然后在这一行上找到所有这些数字的平均值。
我已经到了能够单独显示每个字符的程度,但对于如何用它们的编号替换字符,我感到非常落后,然后继续前进。
script.py
d = {
'!':0, '"':1, '#':2, '$':3, '%':4, '&':5, '\'':6, '(':7, ')':8,
'*':9, '+':10, ',':11, '-':12, '.':13, '/':14, '0':15,'1':16,
'2':17, '3':18, '4':19, '5':20, '6':21, '7':22, '8':23, '9':24,
':':25, ';':26, '<':27, '=':28, '>':29, '?':30, '@':31, 'A':32, 'B':33,
'C':34, 'D':35, 'E':36, 'F':37, 'G':38, 'H':39, 'I':40, 'J':41 }
with open('sample.fastq') as fin:
for i in fin.readlines()[3::4]:
for j in i:
print j输出应该如下所示,并存储在一个新文件中。
output.txt
@HISE
GGATCGCAATGGGTA
+
19 #From 34 34 31 0 3 4 9 5 41 2 6 25 32 32 32
@HISE
ATCGATCGATCGATA
+
23 #From 7 8 9 9 35 16 17 36 37 39 40 31 3 26 26我的提议有可能吗?
发布于 2015-02-24 18:19:05
您可以通过输入文件行上的for循环来完成此操作:
with open('sample.fastq') as fin, open('outfile.fastq', "w") as outf:
for i, line in enumerate(fin):
if i % 4 == 3: # only change every fourth line
# don't forget to do line[:-1] to get rid of newline
qualities = [d[ch] for ch in line[:-1]]
# take the average quality score. Note that as in your example,
# this truncates each to an integer
average = sum(qualities) / len(qualities)
# new version; average with \n at end
line = str(average) + "\n"
# write line (or new version thereof)
outf.write(line)这将产生所需的输出:
@HISE
GGATCGCAATGGGTA
+
19
@HISE
ATCGATCGATCGATA
+
22发布于 2015-02-24 18:18:42
假设您从stdin读取并写入stdout
for i, line in enumerate(stdin, 1):
line = line[:-1] # Remove newline
if i % 4 != 0:
print(line)
continue
nums = [d[c] for c in line]
print(sum(nums) / float(len(nums)))https://stackoverflow.com/questions/28702974
复制相似问题