我正在尝试使用Perl将句点和空格后出现的所有小写字母大写。这是一个输入的示例:
...so, that's our art. the 4 of us can now have a dialog. we can have a conversation. we can speak to...这是我希望看到的输出:
...so, that's our art. The 4 of us can now have a dialog. We can have a conversation. We can speak to... 我尝试了多个正则表达式,但没有太多成功--例如:
$currentLine =~ s/\.\s([a-z])/\. \u$1/g;或
$currentLine =~ s/([\.!?]\s*)(\w)/$1\U$2/g;但是我没有得到预期的结果。请帮帮我!
更新
正如有人指出的那样,为了提供背景信息,问题可能出在其他地方。正则表达式在这个小脚本的上下文中使用,这个小脚本除了创建这篇文章的步骤之外,还做了一些事情。我在从视频隐藏字幕中获得的长SRT文件上运行它。再次感谢您的帮助。
#! perl
use strict;
use warnings;
my $filename = $ARGV[0];
open(INPUT_FILE, $filename)
or die "Couldn't open $filename for reading!";
while (<INPUT_FILE>) {
my $currentLine = $_;
# Remove empty lines and lines that start with digits
if ($currentLine =~ /^[\s+|\d+]/){
next;
}
# Remove all carriage returns
$currentLine =~ s/\R$/ /;
# Convert all letters to lower case
$currentLine =~ s/([A-Z])/\l$1/g;
# Capitalize after period <= STEP THAT DOES NOT WORK
$currentLine =~ s/\.\s([a-z])/\. \u$1/g;
print $currentLine;
}
close(INPUT_FILE);发布于 2016-01-03 12:51:50
尝尝这个
使用look look捕获模式,并使用\U将字符串的开头更改为大写
$str ="...so, that's our art. the 4 of us can now have a dialog. we can have a conversation. we can speak to...";
$str =~ s/(?<=\w\.\s)(\w)/\U$1/g;
print $str或者尝试通过替换来保持单词的\K。
$str =~ s/\w\.\s\K(\w)/\U$1/g;发布于 2016-01-06 11:39:41
其中一个问题是代码:
if ($currentLine =~ /^[\s+|\d+]/){
next;
}与注释相反,这将忽略以空格、数字、加号或管道符号开头的行。这可能会让你走上错误的道路。你可能想写下:
next if /^(\s+$|\d)/;如果整行都是空格,或者第一个字符是数字,则跳过一行。
您可以简化您的循环,并将其泛化,如下所示:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
# Remove empty lines and lines that start with digits. sometimes
next if /^(\s+$|\d)/;
# Remove all carriage returns. forever
s/\R$//;
# Convert all letters to lower case. always
s/([A-Z])/\l$1/g;
# Capitalize after period <=... STEP THAT DOES NOT WORK
s/\.\s([a-z])/\. \u$1/g;
print "$_\n";
}在自身运行时,输出为:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
# remove empty lines and lines that start with digits. Sometimes
next if /^(\s+$|\d)/;
# remove all carriage returns. Forever
s/\r$//;
# convert all letters to lower case. Always
s/([a-z])/\l$1/g;
# capitalize after period <=... Step that does not work
s/\.\s([a-z])/\. \u$1/g;
print "$_\n";
}请注意,要使转换后的脚本工作,您需要在替换操作上使用/gi作为修饰符(而不是/g)。不过,这段代码还有很大的改进空间。
测试发生了什么的一个基本方法是在每个步骤中打印所有内容。
while (<INPUT_FILE>) {
print "## $_";
my $currentLine = $_;
# Remove empty lines and lines that start with digits
if ($currentLine =~ /^[\s+|\d+]/){
print "#SKIP# $currentLine";
next;
}
# Remove all carriage returns
$currentLine =~ s/\R$/ /;
print "#EOL# $currentLine##\n";
# Convert all letters to lower case
$currentLine =~ s/([A-Z])/\l$1/g;
print "#LC# $currentLine##\n";
# Capitalize after period <= STEP THAT DOES NOT WORK
$currentLine =~ s/\.\s([a-z])/\. \u$1/g;
print "#CAPS# $currentLine##\n";
print $currentLine; # Needs a newline!
}这会告诉你发生了什么,以及出了什么问题。请注意,将通用EOL (\R)替换为空白意味着输出不会以换行符结束。这也是一个糟糕的想法-这就是为什么我生成的输出以换行符结尾;要么是从文件中读取的输出,要么是在被删除的输出之后添加一个。
此外,当你需要一个明确的文件句柄时,你应该避免使用ALL_CAPS文件句柄,而使用词法句柄。
open my $fh, '<', $filename
or die "Couldn't open $filename for reading!";在错误消息中包含文件名的工作做得很好(尽管添加$!来报告系统错误消息也是一个好主意)。
发布于 2016-01-03 12:44:51
# (char)(char)(char) (char)(char)(char) Uppercase the 3rd
$str =~ s/(\.)(\s)(\w)/$1$2\U$3/g;
print $str
...so, that's our art. the 4 of us can now have a dialog. we can have a conversation. we can speak to...
...so, that's our art. The 4 of us can now have a dialog. We can have a conversation. We can speak to...https://stackoverflow.com/questions/34573167
复制相似问题