首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何在IMAP邮件主题中搜索非ascii文本

如何在IMAP邮件主题中搜索非ascii文本
EN

Stack Overflow用户
提问于 2020-07-07 20:31:08
回答 3查看 504关注 0票数 3

这是MWE:

代码语言:javascript
运行
复制
#!/usr/bin/perl

use utf8;
use strict;
use warnings;
use Net::IMAP::Client;
use Encode qw/decode/;
use open ':std', ':encoding(UTF-8)';

my $user = 'my-user@gmail.com';
my $pwd = 'secret';

my $imap = Net::IMAP::Client->new(
    server          => 'imap.gmail.com',
    user            => $user,
    pass            => $pwd,
    ssl             => 1,      # (use SSL? default no)
    ssl_verify_peer => 1,      # (use ca to verify server, default yes)
    port            => 993
) or die "Could not connect to IMAP server: $!";

$imap->login or die('Login failed: ' . $imap->last_error);

# all the incoices from my telephone company
$imap->select('INBOX');
my $messages = $imap->search({
    from    => 'invoice@mgts.ru',
    #subject => '2020',
});

unless(defined($messages))
{
    $imap->logout();
    die "no messages";
}

foreach my $id (@$messages)
{
    my $summary = $imap->get_summaries([$id])->[0];

    my $subject = $summary->subject;
    $subject = decode('MIME-Header', $subject);
    print $subject."\n";
}

这将在邮箱中输出来自invoice@mgts.ru的所有发票:

代码语言:javascript
运行
复制
Счёт за услуги ПАО МГТС за Июнь 2017 г.
Счёт за услуги ПАО МГТС за Июль 2017 г.
Счёт за услуги ПАО МГТС за Август 2017 г.
Счёт за услуги ПАО МГТС за Ноябрь 2017 г.
Счёт за услуги ПАО МГТС за Декабрь 2017 г.
Счёт за услуги ПАО МГТС за Ноябрь 2018 г.
Счёт за услуги ПАО МГТС за Декабрь 2018 г.
Счёт за услуги ПАО МГТС за Декабрь 2019 г.
Счёт за услуги ПАО МГТС за Март 2020 г.
Счёт за услуги ПАО МГТС за Апрель 2020 г.

一切都是正确的。

现在我添加新的条件,取消注释:

代码语言:javascript
运行
复制
#subject => '2020',

我拿到了2020年的所有发票:

代码语言:javascript
运行
复制
Счёт за услуги ПАО МГТС за Март 2020 г.
Счёт за услуги ПАО МГТС за Апрель 2020 г.

但是,当我在搜索中添加"Апрель“(四月)一词时:

代码语言:javascript
运行
复制
subject => 'Апрель 2020',

我收到“没有消息”,尽管事实上,这个子字符串存在于邮箱中的1封电子邮件的主题中。

电子邮件中的主题如下:

代码语言:javascript
运行
复制
Subject: =?utf-8?Q?=D0=A1=D1=87=D1=91=D1=82=20?=

出了什么问题,怎么解决?

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-07-08 06:44:11

  1. Net::IMAP::Client似乎不支持覆盖UTF-8编码字符串的IMAP4rev1
  2. 根据Gmail文档,您可以使用X-GM-生属性获取Gmail接口的相同结果。

如果我必须使用Net::IMAP::Client,我将添加一个新方法来执行X-GM-RAW

代码语言:javascript
运行
复制
#!/usr/bin/perl

use utf8;
use strict;
use warnings;
use Net::IMAP::Client;
use Encode qw/decode/;
use IO::Socket qw(:crlf);
use open ':std', ':encoding(UTF-8)';

my $user = 'my-user@gmail.com';
my $pwd = 'secret';

my $imap = Net::IMAP::Client->new(
    server          => 'imap.gmail.com',
    user            => $user,
    pass            => $pwd,
    ssl             => 1,      # (use SSL? default no)
    ssl_verify_peer => 1,      # (use ca to verify server, default yes)
    port            => 993
) or die "Could not connect to IMAP server: $!";

$imap->login or die('Login failed: ' . $imap->last_error);

# Add search_gmail method to Net::IMAP::Client
sub Net::IMAP::Client::search_gmail {
    my ($self, $criteria) = @_;

    my @crit;
    for my $key (keys %{$criteria}) {
        push @crit, join ":", $key, $criteria->{$key};
    }

    my $crit_str = join q{ }, @crit;

    my ($ok, $lines);
    ($ok, $lines) = $self->_tell_imap('SEARCH' => "CHARSET UTF-8 X-GM-RAW " . do {
        use bytes;
        sprintf qq{{%d}%s%s}, length($crit_str), $CRLF, $crit_str;
    });

    return unless $ok;

    for my $line (@{$lines->[1]}) {
        if ($line =~ s/^\*\s+SEARCH\s+//ig) {
            $line =~ s/\s*$//g;
            return [ map { $_ + 0 } split(/\s+/, $line) ];
        }
    }
}

# all the incoices from my telephone company
$imap->select('INBOX');
my $messages = $imap->search_gmail({
    from    => 'invoice@mgts.ru',
    #subject => '2020',
});

unless(defined($messages))
{
    $imap->logout();
    die "no messages";
}

foreach my $id (@$messages)
{
    my $summary = $imap->get_summaries([$id])->[0];

    my $subject = $summary->subject;
    $subject = decode('MIME-Header', $subject);
    print $subject."\n";
}
票数 4
EN

Stack Overflow用户

发布于 2020-07-07 21:59:19

(请参阅本文底部的Python示例,该示例似乎有效)

我尝试了另一个模块Net::IMAP::Simple::Gmail,因为它有一个debug输出选项(我首先用subject Апрель 2020向自己发送了一封电子邮件,这样我就可以轻松地测试行为):

代码语言:javascript
运行
复制
use feature qw(say);
use strict;
use warnings;
use utf8;
use Net::IMAP::Simple::Gmail;
use Encode qw(encode_utf8);

my $server = 'imap.gmail.com';
my $imap = Net::IMAP::Simple::Gmail->new($server, debug => 1);
my $user = 'me@gmail.com';
my $pass = 'mypass';

if(!$imap->login($user,$pass)){
    die "Login failed: " . $imap->errstr . "\n";
}
my $num_messages = $imap->select('INBOX') or die $imap->errstr;
my @ids = $imap->search(encode_utf8('SUBJECT "Апрель 2020"'));
say "Found ", (scalar @ids), " messages";

输出

代码语言:javascript
运行
复制
[...l/5.30.0/Net/IMAP/Simple.pm line 133 in sub _connect] connecting to imap.gmail.com:993
[...l/5.30.0/Net/IMAP/Simple.pm line 133 in sub _connect] connected, returning socket
[./p.pl line 11 in sub new] waiting for socket ready
[./p.pl line 11 in sub new] looking for greeting
[./p.pl line 11 in sub new] got a greeting: * OK Gimap ready for requests from 51.174.5.83 u18mb43700719ljl\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1252 in sub _send_cmd] 0 LOGIN me@gmail.com "mypass"\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 265 in sub _process_cmd] * CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 UIDPLUS COMPRESS=DEFLATE ENABLE MOVE CONDSTORE ESEARCH UTF8=ACCEPT LIST-EXTENDED LIST-STATUS LITERAL- SPECIAL-USE APPENDLIMIT=35651584\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 UIDPLUS COMPRESS=DEFLATE ENABLE MOVE CONDSTORE ESEARCH UTF8=ACCEPT LIST-EXTENDED LIST-STATUS LITERAL- SPECIAL-USE APPENDLIMIT=35651584\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 265 in sub _process_cmd] 0 OK hakon.hagland@gmail.com authenticated (Success)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] 0 OK hakon.hagland@gmail.com authenticated (Success)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1252 in sub _send_cmd] 1 SELECT "INBOX"\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * FLAGS (\Answered \Flagged \Draft \Deleted \Seen $MailFlagBit0 $MailFlagBit1 $NotJunk $NotPhishing $Phishing NotJunk)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * FLAGS (\Answered \Flagged \Draft \Deleted \Seen $MailFlagBit0 $MailFlagBit1 $NotJunk $NotPhishing $Phishing NotJunk)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted \Seen $MailFlagBit0 $MailFlagBit1 $NotJunk $NotPhishing $Phishing NotJunk \*)] Flags permitted.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted \Seen $MailFlagBit0 $MailFlagBit1 $NotJunk $NotPhishing $Phishing NotJunk \*)] Flags permitted.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * OK [UIDVALIDITY 638142060] UIDs valid.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * OK [UIDVALIDITY 638142060] UIDs valid.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * 27869 EXISTS\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * 27869 EXISTS\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * 0 RECENT\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * 0 RECENT\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * OK [UIDNEXT 32724] Predicted next UID.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * OK [UIDNEXT 32724] Predicted next UID.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * OK [HIGHESTMODSEQ 4375397]\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * OK [HIGHESTMODSEQ 4375397]\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] 1 OK [READ-WRITE] INBOX selected. (Success)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] 1 OK [READ-WRITE] INBOX selected. (Success)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1252 in sub _send_cmd] 2 SEARCH SUBJECT "Апрель 2020"\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 615 in sub _process_cmd] 2 BAD Could not parse command\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] 2 BAD Could not parse command\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1192 in sub _seterrstr] Could not parse command\r
Found 0 messages

注意输出:

代码语言:javascript
运行
复制
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] 2 BAD Could not parse command\r\n

这是在第1260项上返回第1260项的套接字

类似地,使用Python IMAPClient模块:

代码语言:javascript
运行
复制
user = 'me@gmail.com';
passw = "mypass";
from imapclient import IMAPClient
server = IMAPClient('imap.gmail.com', use_uid=True, ssl=True)
result = server.login(user, passw)
print(result)
select_info = server.select_folder('INBOX')
print(select_info)
messages = server.search(['SUBJECT', 'Апрель 2020'.encode('utf8')])
print(messages)

给出输出:

代码语言:javascript
运行
复制
b'me@gmail.com authenticated (Success)'
{b'PERMANENTFLAGS': (b'\\Answered', b'\\Flagged', b'\\Draft', b'\\Deleted', b'\\Seen', b'$MailFlagBit0', b'$MailFlagBit1', b'$NotJunk', b'$NotPhishing', b'$Phishing', b'NotJunk', b'\\*'), b'FLAGS': (b'\\Answered', b'\\Flagged', b'\\Draft', b'\\Deleted', b'\\Seen', b'$MailFlagBit0', b'$MailFlagBit1', b'$NotJunk', b'$NotPhishing', b'$Phishing', b'NotJunk'), b'UIDVALIDITY': 638142060, b'EXISTS': 27869, b'RECENT': 0, b'UIDNEXT': 32724, b'HIGHESTMODSEQ': 4375417, b'READ-WRITE': True}
Traceback (most recent call last):
  File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/site-packages/imapclient/imapclient.py", line 982, in _search
    data = self._raw_command_untagged(b'SEARCH', args)
  File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/site-packages/imapclient/imapclient.py", line 1445, in _raw_command_untagged
    typ, data = self._raw_command(command, args, uid=uid)
  File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/site-packages/imapclient/imapclient.py", line 1507, in _raw_command
    return self._imap._command_complete(to_unicode(command), tag)
  File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/imaplib.py", line 1027, in _command_complete
    raise self.error('%s command error: %s %s' % (name, typ, data))
imaplib.error: SEARCH command error: BAD [b'Could not parse command']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./t.py", line 11, in <module>
    messages = server.search(['SUBJECT', 'Апрель 2020'.encode('utf8')])
  File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/site-packages/imapclient/imapclient.py", line 956, in search
    return self._search(criteria, charset)
  File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/site-packages/imapclient/imapclient.py", line 995, in _search
    criteria='"%s"' % criteria if not isinstance(criteria, list) else criteria
imapclient.exceptions.InvalidCriteriaError: b'Could not parse command'

This error may have been caused by a syntax error in the criteria: ['SUBJECT', b'\xd0\x90\xd0\xbf\xd1\x80\xd0\xb5\xd0\xbb\xd1\x8c 2020']
Please refer to the documentation for more information about search criteria syntax..
https://imapclient.readthedocs.io/en/master/#imapclient.IMAPClient.search

最后,我找到了Python示例,它看起来确实有效:

代码语言:javascript
运行
复制
import imaplib
user = "me@gmail.com"
passw = "mypass";
sock = imaplib.IMAP4_SSL("imap.gmail.com", 993)
sock.login(user, passw)
sock.select()
sock.debug = 4
sock.literal = u"Апрель 2020".encode('utf8')
res = sock.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')
print(res)

输出

代码语言:javascript
运行
复制
  04:44.46 > b'DMOM3 UID SEARCH CHARSET UTF-8 SUBJECT {17}'
  04:44.52 < b'+ go ahead'
  04:44.52 write literal size 17
  04:44.67 < b'* SEARCH 32720'
  04:44.67 < b'DMOM3 OK SEARCH completed (Success)'
('OK', [b'32720'])
票数 1
EN

Stack Overflow用户

发布于 2020-07-07 20:38:45

我将开始治疗Unicode:

代码语言:javascript
运行
复制
use strict; use warnings;
use utf8;
binmode $_, ":utf8" for qw/STDOUT STDIN STDERR/;

使用最近的Perl:

代码语言:javascript
运行
复制
use feature 'unicode_strings';

https://perldoc.perl.org/perlunicode.html#Important-Caveats

票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62783448

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档