这是我在这里的第一个代码,也是我第一个这么大的程序。我真的不知道一个编写“好的、可读的”代码的程序员的期望是什么。这是我的第一个程序,将用于一个现实世界的应用程序.而且,我对Python非常陌生。因此,在回顾时,请对这些代码或我的代码如何在一般的Python和编程方面都更好的问题提出建设性的批评。我将尽量用最好的方式解释下一段中的问题。如果需要对我的代码/逻辑/问题做任何澄清,请在评论中随意提问,我将尽力消除疑虑。
如果您正在寻找某种类型的输出,我还不能提供它,因为我需要做的比较是关于(38869 * 2588 * all possibble combinatons of each of the 2588) + time taken to generate all the permutations的。所以我的机器不足以做那样的事情。
## Date : 2017-08-10
## Author : dadyodevil
## Contact : daddyodevil@gmail.com
##
## A python program to detect all indices of complimentary Micro-RNA(miRNA) target sites on Messenger-RNAs(mRNA)
##
## As an input, this program needs two lists -
## 1. A list of mRNAs where each entry is represented in a two line format:
## >hg19_refGene NM_032291 range=chr1:67208779-67210768...
## Sequence of mRNA
## 2. A list of miRNAs where each entry is represented in a two line format:
## >hsa-miR-576-3p MIMAT000...
## Sequence of miRNA
##
## Pre-requisites for the reader -
## 1. Understanding of programming concepts
## 2. A moderate understanding of the Python programming language version 2.7
## 3. Knowledge of terms regarding miRNA-mRNA target detection
import re
def extractSeed(miRNA):
## There are 4 seed regions with indices from 2-7, 2-8, 1-7 and 1-8
miRNAfor6mer.append(miRNA[1:7][::-1])
miRNAfor7mer.append(miRNA[1:8][::-1])
miRNAfor7a1.append(miRNA[:7][::-1])
miRNAfor8mer.append(miRNA[0:9][::-1])
def createCompliment(allCompliments, miRNA, wobbleCount, compliment):
## For the compliment, the convertions include a:t, u:a, g:c, c:g and for Wobble-Pairs, u:g and g:u
if wobbleCount == 2:
for letter in miRNA:
if letter == 'a':
compliment += 't'
elif letter == 'c':
compliment += 'g'
elif letter == 'g':
compliment += 'c'
else:
compliment += 'a'
allCompliments.append(compliment)
else:
for index, letter in enumerate(miRNA):
if letter == 'a':
compliment += 't'
elif letter == 'c':
compliment += 'g'
elif letter == 'g':
createCompliment(allCompliments, miRNA[index+1:], wobbleCount + 1, compliment + "t")
createCompliment(allCompliments, miRNA[index+1:], wobbleCount + 1, compliment + "c")
compliment += 'c'
elif letter == 'u':
createCompliment(allCompliments, miRNA[index+1:], wobbleCount + 1, compliment + "g")
createCompliment(allCompliments, miRNA[index+1:], wobbleCount + 1, compliment + "a")
compliment += 'a'
## Now that all possibilities are generated, the duplicates need to be removed
allCompliments = sorted(list(set(allCompliments)))
def checkForMatch(miRNACompliments, seedRegion, miRNAname):
## Each miRNA that is recived by this function will be compared against the whole list of mRNAs and the matching indices will be saved
## Since the mRNA sequences are in alternate lines the sequences will be extracted as such and the when matches are found, the name of the mRNA will be extracted from teh index just before the current one
for index in range(1, len(mRNA_List), 2):
for entry in miRNACompliments:
mRNA = mRNA_List[index]
matchesStart = [m.start() for m in re.finditer(entry, mRNA)]
if (len(matchesStart) > 0):
mRNAname = mRNA_List[index-1][14:mRNA_List[index-1].find(" ",15)]
matchesEnd = []
for index2 in range(0, len(matchesStart)):
matchesEnd.append(matchesStart[index2] + len(entry))
allindices = zip(matchesStart, matchesEnd)
complimentarySiteList.append([miRNAname, mRNAname, seedRegion, allindices])
def prepareForMatch(miRNA, miRNAname):
global miRNAfor6mer, miRNAfor7mer, miRNAfor7a1, miRNAfor8mer
miRNAfor6mer, miRNAfor7mer, miRNAfor7a1, miRNAfor8mer = [], [], [], []
## First the seed sites will be extracted and reversed
extractSeed(miRNA)
## Empty lists will be generated to store all the compliments
miRNAfor6mer.append([])
miRNAfor7mer.append([])
miRNAfor7a1.append([])
miRNAfor8mer.append([])
## Then the compliments will be generated from the seed regions along with atmost of two Wobble-Pairs
miRNAfor6mer.append(createCompliment(miRNAfor6mer[1], miRNAfor6mer[0], 0, ""))
miRNAfor7mer.append(createCompliment(miRNAfor7mer[1], miRNAfor7mer[0], 0, ""))
miRNAfor7a1.append(createCompliment(miRNAfor7a1[1], miRNAfor7a1[0], 0, ""))
miRNAfor8mer.append(createCompliment(miRNAfor8mer[1], miRNAfor8mer[0], 0, ""))
## After generating all possible compliments, they will be checked for matching sites
checkForMatch(miRNAfor6mer[1], "6mer", miRNAname)
checkForMatch(miRNAfor7mer[1], "7mer", miRNAname)
checkForMatch(miRNAfor7a1[1], "7A1", miRNAname)
checkForMatch(miRNAfor8mer[1], "8mer", miRNAname)
def Main():
global mRNA_List, miRNA_List, complimentarySiteList
miRNA_List = open('miRNA_list.txt').read().splitlines()
mRNA_List = open('mRNA_list.txt').read().splitlines()
complimentarySiteList = []
## Since the sequences are in every alteRNAte lines, the 'index' needs to be incremeted by 2 to access only the sequences
## The miRNA lengths are also checked whether they are atleast 8 neucleotides long, if they are not, they will not be checked
for index in range(1,len(miRNA_List),2):
miRNAname = miRNA_List[index-1][5:miRNA_List[index-1].find(' ')]
if (len(miRNA_List[index]) < 8):
print "%s at %d has insufficient length." %(miRNAname, index)
else:
prepareForMatch(miRNA_List[index].lower(), miRNAname)
for entry in complimentarySiteList:
print entry
if __name__ == '__main__':
Main()发布于 2017-08-11 10:04:06
createCompliment不返回任何内容,而是对allCompliments进行变异。当您分配给它时,这个突变不起作用,例如在最后一行。allCompliments = sorted(list(set(allCompliments)))。这没什么作用,因为你以后不使用它。createCompliment中使用两个慢速for循环,不如执行所有标准转换,然后通过循环遍历特殊索引的所有组合来处理特殊转换。这样,如果您在执行if wobbleCount == 2:时执行循环,那么首先,这样您就可以得到基本的转换,那么在执行特殊的转换时,您就不必关心它们。这样,如果您有输入ccagaa,那么就可以将它转换为ggtctt,而不关心g。最简单的方法是使用str.translate。在此之后,您需要转换特殊的转换,即g -> t和u -> g。但是,由于我们执行了上述操作,所以它们是c -> t和a -> g。要转换这些,您需要获取c和a的索引。你可以用一张清单来理解。[i for i, c in enumerate(rna) if c in 'ac']。在此之后,您希望将这些字符的一次到两次出现的所有组合进行转换。这意味着我们可以使用itertools.combinations循环我们想要改变的所有组合。最后,我们必须转换值,因此使用[:]复制列表,这是整个列表的一个切片。然后循环遍历索引,我们可以转换列表中的值,并yield列表的字符串版本。这方面的一个例子是:我们从rna = 'cagu'开始,将它转换为基本的转换,gtca。在此之后,我们得到了所有特殊字符的索引,即[2, 3]。然后,我们将遍历所有这些组合,它们是[(2,), (3,), (2, 3)],以及转换后的单词yield,即gtta、gtcg和gttg。将yield看作array.append是最简单的方法。因此,函数的以下内容大致相同: def fn_1():产生1产2 def fn_2():array = [] array.append(1) array.append(2)返回数组def fn_3():返回1,2列表(fn_1()) == fn_2() == fn_3() # Truecheck_for_match,当您在一个二维元组的扁平列表中循环时,您想要取消这个列表。因此,[0, 1, 2, 3]将成为[(0, 1), (2, 3)],允许使用更简单的for a, b in ...循环方法。为此,您可以使用grouper食谱:DEF石斑鱼(迭代,n,fillvalue=None):“将数据收集到固定长度的块或块中”#石斑鱼(‘ABCDEFG’,3,'x') -> ABC def Gxx args = 可迭代的 *n返回izip_longest(fillvalue=fillvalue,*args),因为[item] * n在item上不执行副本,并且利用迭代器的工作方式。前者意味着[item] * 2与[item, item]相同,而不是[item, copy(item)]。这一点很重要,因为这确保了两个项都是相同的迭代器。多次使用单个迭代器是很重要的,因为zip基本上使用[(next(it), next(it)), (next(it), next(it)), ...],它更复杂一些,因为它可以处理任意大小的it,并且知道什么时候it停止。然而,这基本上是它的工作方式。所以我会把你的代码改为:
import re
import string
import itertools
TRANS = string.maketrans('acgu', 'tgca')
CONVS = {'a': 'g', 'c': 't'}
SEEDS = [
"6mer",
"7mer",
"7A1",
"8mer"
]
def create_compliments(rna):
rna = rna.translate(TRANS)
yield rna
all_indexes = [i for i, c in enumerate(rna) if c in CONVS]
rna = list(rna)
for n in (1, 2):
for indexes in itertools.combinations(all_indexes, n):
t = rna[:]
for index in indexes:
t[index] = CONVS[t[index]]
yield ''.join(t)
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.izip_longest(*args, fillvalue=fillvalue)
def check_for_match(mi_RNAs, seed, mi_RNA_name, m_RNA_list):
mi_RNAs = list(mi_RNAs)
for m_RNA_name, m_RNA in grouper(m_RNA_list, 2):
m_RNA_name = m_RNA_name[14:m_RNA_name.find(" ", 15)]
for entry in mi_RNAs:
matches = [m.start() for m in re.finditer(entry, m_RNA)]
if matches:
all_indices = tuple(
(match, match + len(entry))
for match in matches
)
yield mi_RNA_name, m_RNA_name, seed, all_indices
def prepare_for_match(mi_RNA, mi_RNA_name, m_RNA_list):
mi_RNAs = [
mi_RNA[1:7][::-1],
mi_RNA[1:8][::-1],
mi_RNA[:7][::-1],
mi_RNA[0:9][::-1]
]
for mi_RNA, seed in zip(mi_RNAs, SEEDS):
for entry in check_for_match(create_compliments(mi_RNA), seed, mi_RNA_name, m_RNA_list):
yield entry
def main():
mi_RNA_list = open('miRNA_list.txt').read().splitlines()
m_RNA_list = open('mRNA_list.txt').read().splitlines()
for index in range(1, len(mi_RNA_list), 2):
mi_RNA_name = mi_RNA_list[index-1][5:mi_RNA_list[index-1].find(' ')]
if (len(mi_RNA_list[index]) < 8):
print "{} at {} has insufficient length.".format(mi_RNA_name, index)
else:
for entry in prepare_for_match(mi_RNA_list[index].lower(), mi_RNA_name, m_RNA_list):
print tuple(entry)
if __name__ == '__main__':
main()https://codereview.stackexchange.com/questions/172673
复制相似问题