文章/答案/技术大牛

发布

问模拟RNA合成的Perl程序
EN

Stack Overflow用户

提问于 2010-11-06 05:06:13

回答 3查看 1.1K关注 0票数 3

寻找关于如何接近我的Perl编程作业作业来编写RNA合成程序的建议。我已经总结并概述了下面的计划。具体来说，我正在寻找关于以下块的反馈(我将编号以便于参考)。我读过安德鲁·约翰逊( Andrew )关于用Perl编程的元素的第6章(很棒的一本书)。我还读过perlfunc和perlop页面，没有任何东西可以从哪里开始。

程序描述:程序应该从命令行读取输入文件，将其翻译成RNA，然后将RNA转录成大写的一个字母氨基酸名称序列。

接受命令行上指定的文件这里我将使用<>操作符
检查以确保文件只包含acgt或dieif ( <> ne ){ die“用法:文件必须只包含核苷酸\n"；}
将DNA转录成RNA (每一个A被U替换，T被A替换，C被G替换，G被C替换)不知道如何做到这一点
拿这个转录，把它分成三个字符‘密码子’，从第一次出现的“8月”不确定，但我想这是我将开始%哈希变量？
取三个字符"codons“，并给他们一个字母符号(大写的一个字母的氨基酸名称)，用一个值指定一个键(这里有70种可能性，所以我不知道该在哪里存储或如何访问)。
如果遇到一个间隙，就会启动一个新的行，并且进程被重复，不确定，但是我们可以假设间隙是三重的倍数。
我是不是走对了路？是否有一个我忽略的Perl函数可以简化主程序？

备注

必须是自包含程序(密码子名称和符号的存储值)。

每当程序读取一个没有符号的密码子--这是RNA中的一个空白--时，它应该启动一条新的输出线，并在下一次出现“8月”时开始。为了简单起见，我们可以假设缺口总是三倍的倍数。

在我花更多的时间做研究之前，我希望得到证实，我正在采取正确的方法。感谢您抽出时间阅读并分享您的专业知识！

perl

hash

bioinformatics

回答 3

Stack Overflow用户

回答已采纳

发布于 2010-11-06 07:19:52

1. here I will use the <> operator

好的，您的计划是逐行读取文件。不要忘记在你走的时候chomp每一行，否则你会在你的序列中出现换行符。

2. Check to make sure the file only contains acgt or die

if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }

在while循环中，<>操作符将读取的行放入特殊变量$_中，除非您显式地分配它(my $line = <>)。

在上面的代码中，您从文件中读取一行并丢弃它。你得保留这句话。

此外，ne运算符比较两个字符串，而不是一个字符串和一个正则表达式。您需要这里的!~运算符(或者=~操作符，带有一个否定的字符类[^acgt] )。如果需要测试不区分大小写，请查看i标志以进行正则表达式匹配。

3. Transcribe the DNA to RNA (Every A replaced by U, T replaced by A, C replaced by G, G replaced by C).

就像GWW说的，检查你的生物学。T->U是转录的唯一步骤.在这里，您会发现tr (音译)运算符很有用。

4. Take this transcription & break it into 3 character 'codons' starting at the first occurance of "AUG"

not sure but I'm thinking this is where I will start a %hash variables?

我会在这里用一个缓冲器。在while(<>)循环之外定义一个标量。使用index来匹配“8月”。如果您找不到它，那么将最后两个基放在标量上(可以使用substr $line, -2, 2 )。在循环的下一次迭代中，(用.=)将行追加到这两个基上，然后再次测试"AUG“。如果你被击中，你会知道在哪里，所以你可以标记点，并开始翻译。

5. Take the 3 character "codons" and give them a single letter Symbol (an uppercase one-letter amino acid name)

Assign a key a value using (there are 70 possibilities here so I'm not sure where to store or how to access)

同样，正如GWW所说，构建一个哈希表：

%codons = ( AUG => 'M', ...)。

然后，您可以使用(例如) split来构建当前行的数组，一次生成三个元素，并从哈希表中获取正确的氨基酸代码。

6.If a gap is encountered a new line is started and process is repeated

not sure but we can assume that gaps are multiples of threes.

请参见上面的。您可以用exists $codons{$current_codon}测试是否存在空白。

7. Am I approaching this the right way? Is there a Perl function that I'm overlooking that can simplify the main program?

你知道，从上面看，这似乎太复杂了。我构建了一些构建块；子程序read_codon和translate：我认为它们极大地帮助了程序的逻辑。

我知道这是一项家庭作业，但我想它可能会帮助你了解其他可能的方法：

use warnings; use strict;
use feature 'state';


# read_codon works by using the new [state][1] feature in Perl 5.10
# both @buffer and $handle represent 'state' on this function:
# Both permits abstracting reading codons from processing the file
# line-by-line.
# Once read_colon is called for the first time, both are initialized.
# Since $handle is a state variable, the current file handle position
# is never reset. Similarly, @buffer always holds whatever was left
# from the previous call.
# The base case is that @buffer contains less than 3bp, in which case
# we need to read a new line, remove the "\n" character,
# split it and push the resulting list to the end of the @buffer.
# If we encounter EOF on the $handle, then we have exhausted the file,
# and the @buffer as well, so we 'return' undef.
# otherwise we pick the first 3bp of the @buffer, join them into a string,
# transcribe it and return it.

sub read_codon {
    my ($file) = @_;

    state @buffer;
    open state $handle, '<', $file or die $!;

    if (@buffer < 3) {
        my $new_line = scalar <$handle> or return;
        chomp $new_line;
        push @buffer, split //, $new_line;
    }

    return transcribe(
                       join '', 
                       shift @buffer,
                       shift @buffer,
                       shift @buffer
                     );
}

sub transcribe {
    my ($codon) = @_;
    $codon =~ tr/T/U/;
    return $codon;
}


# translate works by using the new [state][1] feature in Perl 5.10
# the $TRANSLATE state is initialized to 0
# as codons are passed to it, 
# the sub updates the state according to start and stop codons.
# Since $TRANSLATE is a state variable, it is only initialized once,
# (the first time the sub is called)
# If the current state is 'translating',
# then the sub returns the appropriate amino-acid from the %codes table, if any.
# Thus this provides a logical way to the caller of this sub to determine whether
# it should print an amino-acid or not: if not, the sub will return undef.
# %codes could also be a state variable, but since it is not actually a 'state',
# it is initialized once, in a code block visible form the sub,
# but separate from the rest of the program, since it is 'private' to the sub

{
    our %codes = (
        AUG => 'M',
        ...
    );

    sub translate {
        my ($codon) = @_ or return;

        state $TRANSLATE = 0;

        $TRANSLATE = 1 if $codon =~ m/AUG/i;
        $TRANSLATE = 0 if $codon =~ m/U(AA|GA|AG)/i;

        return $codes{$codon} if $TRANSLATE;
    }
}

票数 5

Stack Overflow用户

发布于 2010-11-06 05:16:51

我可以在你的几点上给你一些提示。

我认为您的第一个目标应该是逐个解析文件字符，确保每个字符都是有效的，将它们分成三个核苷酸集合，然后再为您的其他目标工作。

我认为你的生物学也有点差，当你将DNA转录成RNA时，你需要考虑股所涉及的内容。在你的转录步骤中，你可能不需要“补充”你的基础。

2.，您应该在解析文件字符时检查这一点。

3.您可以使用一个循环和一些if语句或散列来完成这一任务。

当您逐字符读取文件时，4.很可能会用计数器来完成。因为你需要在第三个字符后插入一个空格。

5.，这将是使用基于氨基酸密码子表的散列的好地方。

6.在解析文件时必须查找gap字符。这似乎与你的第二个要求相矛盾，因为程序说你的文本只能包含ATGC。

有很多perl函数可以使这更容易。还有perl模块，如bioperl。但我认为使用其中的一些可能会使你的任务落空。

票数 3

Stack Overflow用户

发布于 2010-11-06 05:17:33

看看BioPerl和浏览源模块，看看如何解决这个问题的指标。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/4112003

复制

相似问题

问模拟RNA合成的Perl程序
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问模拟RNA合成的Perl程序EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问模拟RNA合成的Perl程序
EN