文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用perl的regex进行汉字匹配

问如何使用perl的regex进行汉字匹配
EN

Stack Overflow用户

提问于 2009-12-23 17:24:52

回答 3查看 4.1K关注 0票数 2

我需要在utf8编码的html中匹配一些中文字符，我写了一些测试代码，如下所示：

#! /usr/bin/perl

use strict;
use LWP::UserAgent;
use Encode;

my $ua = new LWP::UserAgent;

my $request = HTTP::Request->new('GET');
my $url = 'http://www.boc.cn/sourcedb/whpj/';
$request->url($url);

my $res = $ua->request($request) ;

my $str_chinese =   encode("utf8" ,"英磅" ) ;  
# my $str_chinese = "英磅" ;


my $str_english = "English" ;
#my $html = decode("utf8" , $res->content) ;
my $html = $res->content ; 

if ( $html =~ /$str_chinese/ ) {
     print "chinese word matched" ;
}else {
     print "chinese word unmatched\n" ;
}

if ( $html =~ /$str_english/i ) {
    print "english word matched\n" ;
}else {
    print "english word unmatched\n" ;
}

输出显示脚本无法匹配html中嵌入的现有中文字符。你能就如何解决我的问题给我一些提示吗？

regex

perl

回答 3

Stack Overflow用户

回答已采纳

发布于 2009-12-23 21:29:15

您应该改用HTTP::Message类中的decoded_content方法。不需要手动解码。

#!/usr/bin/env perl
use utf8;
use strict;
use LWP::UserAgent;

my $html = LWP::UserAgent->new
    ->get('http://www.boc.cn/sourcedb/whpj/')
    ->decoded_content;

my $str_chinese = '首页';
my $str_english = 'English';

if ($html =~ /$str_chinese/) {
    print "chinese word matched\n";
} else {
    print "chinese word unmatched\n";
}

if ($html =~ /$str_english/i) {
    print "english word matched\n";
} else {
    print "english word unmatched\n";
}

输出：

chinese word matched
english word matched

票数 3

Stack Overflow用户

发布于 2009-12-23 18:08:28

由于您已在源代码中添加了UTF-8字符，因此您必须：

use utf8;

它告诉Perl您的脚本是用UTF-8编写的。

票数 7

Stack Overflow用户

发布于 2009-12-23 18:13:55

我运行你的代码，但中文字符不匹配。

然后我检查html，它不包含这些字符。因此，这可能是不匹配情况的原因。然后，我尝试了一些其他字符(联)，并删除了encode函数。即my $str_chinese = "联";

运行包含此更改的代码，字符将匹配。

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/1951613

复制

相似问题

问如何使用perl的regex进行汉字匹配
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用perl的regex进行汉字匹配EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用perl的regex进行汉字匹配
EN