# 第五章 模体和循环

1. 流程控制

1.1 条件语句

if 1 == 1：　　print（"1 equals 1\n\n"）

1 equals 1

if 1: print("1 evaluates to true\n\n")

1 evaluates to true

if 1 == 0 : print("1 equals 0\n\n")

if 0: print("0 evaluates to true\n\n")

if 1 == 1: print("1 equals 1\n\n")else: print("1 does not equal 1\n\n")

1 equals 1

if-else在测试条件为True时执行一项操作，如果为False则执行另一项操作。 如下是计算结果为False的if-else语句：

if 1 == 0: print("1 equals 0\n\n")else: print("1 does not equal 0\n\n")

1 does not equal 01.1.1 条件判断和缩进

#!/usr/bin/env python# if-elif-elseword = 'MNIDDKL'# if-elif-else conditionalsif word == 'QSTVSGE': print("QSTVSGE\n")elif word == 'MRQQDMISHDEL': print("MRQQDMISHDEL\n")elif word == 'MNIDDKL': print("MNIDDKL--the magic word!\n")else: print('Is \"%s\" a peptide? This program is not sure.\n' % word)exit()

MNIDDKL--the magic word!1.2 循环

#!/usr/bin/env pythonimport os# Reading protein sequence data from a file, take 4# The filename of the file containing the protein sequence dataproteinfilename = 'NM_021964fragment.pep'# First we have to "open" the file, and in case the# open fails, print an error message and exit the program.if os.path.exists(proteinfilename): PROTEINFILE = open(proteinfilename)else: print("Could not open file %s!\n" % proteinfilename) exit() # Read the protein sequence data from the file in a "while" loop,# printing each line as it is read.protein = PROTEINFILE .readline()while protein: print(" ###### Here is the next line of the file:\n") print(protein) protein = PROTEINFILE .readline()# Close the file.PROTEINFILE.close()exit()

###### Here is the next line of the file:MNIDDKLEGLFLKCGGIDEMQSSRTMVVMGGVSGQSTVSGELQD ###### Here is the next line of the file:SVLQDRSMPHQEILAADEVLQESEMRQQDMISHDELMVHEETVKNDEEQMETHERLPQ ###### Here is the next line of the file:GLQYALNVPISVKQEITFTDVSEQLMRDKKQIR

1.2.1 open函数和os模块

open函数调用时系统调用，因为要打开文件，Python必须从操作系统中请求该文件。操作系统可以是Unix/Linux、Microsoft Windows、Apple Macintosh等等，文件由操作系统管理，只能由它访问。检查系统调用的成功或失败是一个好习惯，特别是在打开文件时。如果系统调用失败，并且没有检查它，程序将继续读取或写入无法打开的文件。你应始终检查故障，并在无法打开文件时立即通知用户或退出程序。

2. 搜索模体

Python有一套便于查找字符串的功能，例子5-3介绍了这种字符串搜索功能，类似的程序一直在生物学研究中使用。它执行以下操作：

#!/usr/bin/env pythonimport os# Searching for motifs# Ask the user for the filename of the file containing# the protein sequence data, and collect it from the keyboardprint "Please type the filename of the protein sequence data: ";proteinfilename = input()# open the file, or exitif os.path.exists(proteinfilename):

PROTEINFILE = open(proteinfilename)

else:

print("Could not open file %s!\n" % proteinfilename)

exit()# Read the protein sequence data from the file, and store it# into the array variable proteinsproteins = PROTEINFILE.readlines()# Close the file - we've read all the data into @protein now.PROTEINFILE.close()# Put the protein sequence data into a single string, as it's easier# to search for a motif in a string than in an array of# lines (what if the motif occurs over a line break?)protein = ''.join(proteins)# Remove whitespaceprotein = protein.replace('\n', '')# In a loop, ask the user for a motif, search for the motif,# and report if it was found.# Exit if no motif is entered.while True: print("Enter a motif to search for: ") motif = input() # exit on an empty user input if not motif: break # Look for the motif if protein.find(motif) != -1: print("I found it!\n\n") else: print("I couldn\'t find it.\n\n")# exit the programexit()

Please type the filename of the protein sequence data: NM_021964fragment.pepEnter a motif to search for: SVLQI found it!Enter a motif to search for: jklI couldn't find it.Enter a motif to search for: QDSVI found it!Enter a motif to search for: HERLPQGLQI found it!Enter a motif to search for: I couldn't find it.

find函数和strip函数

not

2.1 从键盘获取用户输入

Python使用内置函数来获取用户在键盘上键入的输入。在例子5-3中，一个名为input的函数接受用户输入数据，返回为 string 类型。当用户键入文件名并通过Enter键发送输入时，文件名会保存到变量proteinfilename。

2.2 使用join函数将列表转成字符串

protein = ''.join(proteins)

DNA3 = DNA1 + DNA2

DNA3 = ''.join([DNA1, DNA2])

[DNA1, DNA2]2.3 python中实现do-until类似循环

2.4 字符串函数find和索引

Python可以轻松操作各种字符串，例如DNA和蛋白质序列数据。字符串内含函数find用来搜索子字符串（模体序列）是否出现在字符串（蛋白质序列）中，如果找到则返回子字符串的起始索引，否则返回-1。

3. 统计核苷酸个数

for each base in the DNA if base is A count_of_A = count_of_A + 1 if base is C count_of_C = count_of_C + 1 if base is G count_of_G = count_of_G + 1 if base is T count_of_T = count_of_T + 1doneprint count_of_A, count_of_C, count_of_G, count_of_T

4. 将字符串转换成列表

read in the DNA from a filejoin the lines of the file into a single string \$DNA# make an array out of the bases of \$DNA@DNA = explode \$DNA# initialize the countscount_of_A = 0count_of_C = 0count_of_G = 0count_of_T = 0for each base in @DNA if base is A count_of_A = count_of_A + 1 if base is C count_of_C = count_of_C + 1 if base is G count_of_G = count_of_G + 1 if base is T count_of_T = count_of_T + 1doneprint count_of_A, count_of_C, count_of_G, count_of_T

#!/usr/bin/env python

import os# Determining frequency of nucleotides# Get the name of the file with the DNA sequence dataprint("Please type the filename of the DNA sequence data: ")dna_filename = input()# open the file, or exitif os.path.exists(dna_filename):

DNAFILE = open(dna_filename)

else:

print("Could not open file %s!\n" % dna_filename)

exit()

# Read the DNA sequence data from the file, and store it# into the array variable DNAsDNAs = DNAFILE.readlines()# Close the fileDNAFILE.close()# From the lines of the DNA file,# put the DNA sequence data into a single string.DNA = ''.join(DNAs)# Remove whitespaceDNA = DNA.replace('\n', '')# Now explode the DNA into an array where each letter of the# original string is now an element in the array.# This will make it easy to look at each position.# Notice that we're reusing the variable DNA for this purpose.DNA = list(DNA)# Initialize the counts.# Notice that we can use scalar variables to hold numbers.count_of_A = 0count_of_C = 0count_of_G = 0count_of_T = 0errors = 0# In a loop, look at each base in turn, determine which of the# four types of nucleotides it is, and increment the# appropriate count.for base in DNA: if base == 'A': ++count_of_A elif base == 'C': ++count_of_C elif base == 'G': ++count_of_G elif base == 'T': ++count_of_T else: print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base) ++errors# print the resultsprint("A = %s\n" % count_of_A)print("C = %s\n" % count_of_C)print("G = %s\n" % count_of_G)print("T = %s\n" % count_of_T)print("errors = %s\n" % errors)# exit the programexit()

AAAAAAAAAAAAAAGGGGGGGTTTTCCCCCCCCCCCCCGTCGTAGTAAAGTATGCAGTAGCVGCCCCCCCCCCGGGGGGGGAAAAAAAAAAAAAAATTTTTTATAAACG

Please type the filename of the DNA sequence data: small.dna!!!!!!!! Error - I don't recognize this base: VA = 40C = 27G = 24T = 17

DNA = list(DNA)

#!/usr/bin/env pythonimport os# Determining frequency of nucleotides# Get the name of the file with the DNA sequence dataprint("Please type the filename of the DNA sequence data: ")dna_filename = input()# open the file, or exitif os.path.exists(dna_filename):　　DNAFILE = open(dna_filename)else:　　print("Could not open file %s!\n" % dna_filename)　　exit()# Read the DNA sequence data from the file, and store it# into the array variable DNAsDNAs = DNAFILE.readlines()# Close the fileDNAFILE.close()# From the lines of the DNA file,# put the DNA sequence data into a single string.DNA = ''.join(DNAs)# Remove whitespaceDNA = DNA.replace('\n', '')# Initialize the counts.# Notice that we can use scalar variables to hold numbers.count_of_A = 0count_of_C = 0count_of_G = 0count_of_T = 0errors = 0# In a loop, look at each base in turn, determine which of the# four types of nucleotides it is, and increment the# appropriate count.for base in DNA: if base == 'A': ++count_of_A elif base == 'C': ++count_of_C elif base == 'G': ++count_of_G elif base == 'T': ++count_of_T else: print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base) ++errors# print the resultsprint("A = %s\n" % count_of_A)print("C = %s\n" % count_of_C)print("G = %s\n" % count_of_G)print("T = %s\n" % count_of_T)print("errors = %s\n" % errors)# exit the programexit()

5. 使用索引

read in the DNA from a filejoin the lines of the file into a single string of \$DNA# initialize the countscount_of_A = 0count_of_C = 0count_of_G = 0count_of_T = 0for each base at each position in \$DNA if base is A count_of_A = count_of_A + 1 if base is C count_of_C = count_of_C + 1 if base is G count_of_G = count_of_G + 1 if base is T count_of_T = count_of_T + 1doneprint count_of_A, count_of_C, count_of_G, count_of_T例子5-5 计算核苷酸评率 2

#!/usr/bin/env pythonimport os# Determining frequency of nucleotides# Get the name of the file with the DNA sequence dataprint("Please type the filename of the DNA sequence data: ")dna_filename = input()# open the file, or exitif os.path.exists(dna_filename):　　DNAFILE = open(dna_filename)else:　　print("Could not open file %s!\n" % dna_filename)　　exit()# Read the DNA sequence data from the file, and store it# into the array variable DNAsDNAs = DNAFILE.readlines()# Close the fileDNAFILE.close()# From the lines of the DNA file,# put the DNA sequence data into a single string.DNA = ''.join(DNAs)# Remove whitespaceDNA = DNA.replace('\n', '')# Initialize the counts.# Notice that we can use scalar variables to hold numbers.count_of_A = 0count_of_C = 0count_of_G = 0count_of_T = 0errors = 0# In a loop, look at each base in turn, determine which of the# four types of nucleotides it is, and increment the# appropriate count.for position in range(len(DNAs)): base = DNAs[position] if base == 'A': ++count_of_A elif base == 'C': ++count_of_C elif base == 'G': ++count_of_G elif base == 'T': ++count_of_T else: print("!!!!!!!! Error - I don\'t recognize this base: %s\n" % base) ++errors# print the resultsprint("A = %s\n" % count_of_A)print("C = %s\n" % count_of_C)print("G = %s\n" % count_of_G)print("T = %s\n" % count_of_T)print("errors = %s\n" % errors)# exit the programexit()

Please type the filename of the DNA sequence data: small.dna!!!!!!!! Error - I don't recognize this vase: VA = 40C = 27G = 24T = 17errors = 1

#!/usr/bin/env pythonimport os# Determining frequency of nucleotides# Get the name of the file with the DNA sequence dataprint("Please type the filename of the DNA sequence data: ")dna_filename = input()# open the file, or exitif os.path.exists(dna_filename):　　DNAFILE = open(dna_filename)else:　　print("Could not open file %s!\n" % dna_filename)　　exit()# Read the DNA sequence data from the file, and store it# into the array variable DNAsDNAs = DNAFILE.readlines()# Close the fileDNAFILE.close()# From the lines of the DNA file,# put the DNA sequence data into a single string.DNA = ''.join(DNAs)# Remove whitespaceDNA = DNA.replace('\n', '')# Initialize the counts.# Notice that we can use scalar variables to hold numbers.count_of_A = 0count_of_C = 0count_of_G = 0count_of_T = 0errors = 0# In a loop, look at each base in turn, determine which of the# four types of nucleotides it is, and increment the# appropriate count.position = 0while position

6. 输出到文件

#!/usr/bin/env pythonimport os# Determining frequency of nucleotides# Get the name of the file with the DNA sequence dataprint("Please type the filename of the DNA sequence data: ")dna_filename = input()# open the file, or exitif os.path.exists(dna_filename):　　DNAFILE = open(dna_filename)else:　　print("Could not open file %s!\n" % dna_filename)　　exit()# Read the DNA sequence data from the file, and store it# into the array variable DNAsDNAs = DNAFILE.readlines()# Close the fileDNAFILE.close()# From the lines of the DNA file,# put the DNA sequence data into a single string.DNA = ''.join(DNAs)# Remove whitespaceDNA = DNA.replace('\n', '')# In a loop, look at each base in turn, determine which of the# four types of nucleotides it is, and increment the# appropriate count.count_of_A = DNA.count('A') + DNA.count('a')count_of_C = DNA.count('C') + DNA.count('c')count_of_G = DNA.count('G') + DNA.count('g')count_of_T = DNA.count('T') + DNA.count('t')errors = len(DNA) - count_of_A -count_of_C - count_of_G - count_of_T# print the resultsprint("A=%d C=%d G=%d T=%d errors=%d\n" % (count_of_A, count_of_C, count_of_G, count_of_T, errors))# Also write the results to a file called "countbase"outputfile = "countbase"COUNTBASE = open(outputfile, 'w')COUNTBASE.write("A=%d C=%d G=%d T=%d errors=%d\n" % (count_of_A, count_of_C, count_of_G, count_of_T, errors))COUNTBASE.close()

# exit the programexit()

Please type the filename of the DNA sequence data: small.dnaA=40 C=27 G=24 T=17 errors=1

A=40 C=27 G=24 T=17 errors=17.练习

5.1 编写一个无限循环的程序，循环每次判断条件为真。

5.2 用户输入两（短）DNA串，使用“+”运算符将第二个字符串连接到第一个字符串末尾。将连接的字符串打印，然后在连接的位置开始打印第二个字符串。例如，输入“AAAA”和“TTTT”，则打印：

AAAATTTT

TTTT

5.3 编写一个程序，打印从1到100的所有数字。

5.4 编写一个程序来获取DNA链的反向互补链。

5.5 编写一个程序来报告蛋白质序列中疏水性氨基酸的百分比。（（要查找哪些氨基酸是疏水性的，请参阅有关蛋白质，分子生物学或细胞生物学的任何介绍性文章。）

5.6 编写一个程序，检查作为参数输入的两个字符串是否彼此反向互补。

5.7 编写一个程序来报告DNA序列GC的比例。

5.8 编写一个程序，可以替换DNA中指定位置的碱基。

Begining Perl for Bioinformatics

• 发表于:
• 原文链接https://kuaibao.qq.com/s/20180812G18SOB00?refer=cp_1026
• 腾讯「云+社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
• 如有侵权，请联系 yunjia_community@tencent.com 删除。

2021-05-07

2021-05-07

2021-05-07

2021-05-07

2018-04-24

2018-03-13

2021-05-07

2021-05-07

2021-05-07

2021-05-07