我试图在perl中获得一个模式匹配,在这里,我在从文件中读取的行的开头检查“非空白字符”,并返回第一个匹配的单词。
问题是,有时候我会用‘’结束这个词,有时我不会。
例如:
假设我有一个具有以下内容的文件。有时还有其他的内容。该文件将自动填充。
some0 Loren Posem:is some color::and some foo bar with 1023:4632
some more content added to the file
some3 Loren Posem:is some color::and some foo bar with 1023:4632
some more content added to the file
替代内容:
some1: Loren Posem:is some will be different with some number 5423:32
some more content added to the file
some3: Loren Posem:is some will be different with some number 5423:32
some more content added to the file
现在我只想从这个文件中提取第一个单词。但是,如果文件有替代的内容,我仍然希望第一个单词忽略尾部的':‘。
我只需要图案匹配部分。这就是我目前所得到的。
foreach ...
if (/^(\S+):/) {
print $1;
}
/*,如果我使用上面的模式匹配,我将从替代内容中获得第一个单词,即some1和some3,忽略尾随":“,但是当I有原始内容$1时不匹配。*/
但如果我用
foreach ...
if (/^(\S+)/) {
print $1;
}
/*现在替代的内容将无法匹配。*/
这里有什么建议吗?
发布于 2014-03-09 07:00:30
如果您有大量数据要处理,split
ting (并设置split
的限制)以获得第一个单词可以提供比捕获正则表达式更重要的性能优势,在本例中:
foreach ...
if ( my $firstWord = ( split /[:\s]/, $_, 2 )[0] ) {
print $firstWord, "\n";
}
use strict;
use warnings;
use Benchmark qw/cmpthese/;
my @data = <DATA>;
sub _split {
for (@data) {
if ( my $firstWord = ( split /[:\s]/, $_, 2 )[0] ) {
#print $firstWord, "\n";
}
}
}
sub _regex {
for (@data) {
if ( my ($firstWord) = /^([^:\s]+)/ ) {
#print $firstWord, "\n";
}
}
}
cmpthese(
-5,
{
_split => sub { _split() },
_regex => sub { _regex() }
}
);
__DATA__
some0 Loren Posem:is some color::and some foo bar with 1023:4632
some3 Loren Posem:is some color::and some foo bar with 1023:4632
some1: Loren Posem:is some will be different with some number 5423:3
some3: Loren Posem:is some will be different with some number 5423:32
输出(表中速度较快的时间较低):
Rate _regex _split
_regex 396843/s -- -12%
_split 450546/s 14% --
但是,您可能会发现正则表达式更易读。
希望这能有所帮助!
发布于 2014-03-09 09:15:12
贪婪的匹配外溢空间和冒号:
while (<DATA>) {
if (/^([^:\s]+)/) {
print "$1\n";
}
}
__DATA__
some0 Loren Posem:is some color::and some foo bar with 1023:4632
some more content added to the file
some3 Loren Posem:is some color::and some foo bar with 1023:4632
some more content added to the file
Alternate content:
some1: Loren Posem:is some will be different with some number 5423:32
some more content added to the file
some3: Loren Posem:is some will be different with some number 5423:32
some more content added to the file
https://stackoverflow.com/questions/22279216
复制相似问题