文章/答案/技术大牛

发布

社区首页 >问答首页 >如果存在Perl返回匹配字符串，则忽略结束分隔符。

问如果存在Perl返回匹配字符串，则忽略结束分隔符。
EN

Stack Overflow用户

提问于 2014-03-09 06:38:22

回答 2查看 158关注 0票数 0

我试图在perl中获得一个模式匹配，在这里，我在从文件中读取的行的开头检查“非空白字符”，并返回第一个匹配的单词。

问题是，有时候我会用‘’结束这个词，有时我不会。

例如：

假设我有一个具有以下内容的文件。有时还有其他的内容。该文件将自动填充。

some0 Loren Posem:is some color::and some foo bar with 1023:4632
      some more content added to the file
some3 Loren Posem:is some color::and some foo bar with 1023:4632
      some more content added to the file

替代内容：

some1: Loren Posem:is some will be different with some number 5423:32
      some more content added to the file
some3: Loren Posem:is some will be different with some number 5423:32
      some more content added to the file

现在我只想从这个文件中提取第一个单词。但是，如果文件有替代的内容，我仍然希望第一个单词忽略尾部的':‘。

我只需要图案匹配部分。这就是我目前所得到的。

foreach ... 
    if  (/^(\S+):/) { 
        print $1;
    }

/*，如果我使用上面的模式匹配，我将从替代内容中获得第一个单词，即some1和some3，忽略尾随":“，但是当I有原始内容$1时不匹配。*/

但如果我用

foreach ... 
    if  (/^(\S+)/) { 
        print $1;
    }

/*现在替代的内容将无法匹配。*/

这里有什么建议吗？

regex

perl

pattern-matching

回答 2

Stack Overflow用户

回答已采纳

发布于 2014-03-09 07:00:30

如果您有大量数据要处理，splitting (并设置split的限制)以获得第一个单词可以提供比捕获正则表达式更重要的性能优势，在本例中：

foreach ... 
    if (  my $firstWord = ( split /[:\s]/, $_, 2 )[0] ) {
    print $firstWord, "\n";
}

基准测试

use strict;
use warnings;
use Benchmark qw/cmpthese/;

my @data = <DATA>;

sub _split {
    for (@data) {
        if ( my $firstWord = ( split /[:\s]/, $_, 2 )[0] ) {
            #print $firstWord, "\n";
        }
    }
}

sub _regex {
    for (@data) {
        if ( my ($firstWord) = /^([^:\s]+)/ ) {
            #print $firstWord, "\n";
        }
    }
}

cmpthese(
    -5,
    {
        _split => sub { _split() },
        _regex => sub { _regex() }
    }
);

__DATA__
some0 Loren Posem:is some color::and some foo bar with 1023:4632
some3 Loren Posem:is some color::and some foo bar with 1023:4632
some1: Loren Posem:is some will be different with some number 5423:3
some3: Loren Posem:is some will be different with some number 5423:32

输出(表中速度较快的时间较低)：

           Rate _regex _split
_regex 396843/s     --   -12%
_split 450546/s    14%     --

但是，您可能会发现正则表达式更易读。

希望这能有所帮助！

票数 1

Stack Overflow用户

发布于 2014-03-09 09:15:12

贪婪的匹配外溢空间和冒号：

while (<DATA>) {
    if  (/^([^:\s]+)/) { 
        print "$1\n";
    }
}

__DATA__
some0 Loren Posem:is some color::and some foo bar with 1023:4632
      some more content added to the file
some3 Loren Posem:is some color::and some foo bar with 1023:4632
      some more content added to the file
Alternate content:

some1: Loren Posem:is some will be different with some number 5423:32
      some more content added to the file
some3: Loren Posem:is some will be different with some number 5423:32
      some more content added to the file

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/22279216

复制

相似问题

问如果存在Perl返回匹配字符串，则忽略结束分隔符。
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如果存在Perl返回匹配字符串，则忽略结束分隔符。EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如果存在Perl返回匹配字符串，则忽略结束分隔符。
EN