文档中心 日志服务 常见问题 采集问题 如何采集部分字段缺失的日志

如何采集部分字段缺失的日志

最近更新时间:2019-11-27 11:45:17

完整日志内容可以通过完全正则采集配置

原始日志中,完整的日志格式如下:

2019/11/18 03:32:31 [error] 20803#0: *492368812 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: 191.12.201.78, server: run.sports.qq.com, request: "GET /999tst999?g_tk=1514204808&p_tk=OpWB6XOF96f2rlAApXgJE50ziHV596xlQ99lEenfZyY_ HTTP/1.1", upstream: "fastcgi://127.0.0.1:10000", host: "run.sports.qq.com", referrer: "\N"

通过如下正则表达式,可以提取出对应的字段:

([^\[]+)\s([^,]+),\sclient:\s([^,]+),\sserver:\s([^,]+),\srequest:\s([^,]+),\supstream:\s([^,]+),\shost:\s([^,]+),\sreferrer:\s([^,]+)

原始日志格式化的字段如下(推荐使用 regex 构建自定义的正则表达式):

各字段依次命名为:log_time、content、client、server、request、upstream、host、referrer。

日志内容字段部分缺失如何采集?

但是在某些情况下,日志中的 upstream、referrer 字段会缺失。

例如,无 upstream 有 referrer:

2019/11/18 04:02:38 [error] 20802#0: *492391323 access forbidden by rule, client: 45.71.63.206, server: admin.sports.qq.com, request: "GET /index HTTP/1.1", host: "admin.sports.qq.com", referrer: "http://admin.sports.qq.com/index"

又或者,无 upstream 无 referrer:

2019/11/18 14:38:42 [error] 20803#0: *492866847 "/root/test/index.html" is forbidden (13: Permission denied), client: 118.79.20.201, server: -, request: "HEAD / HTTP/1.1", host: "451a9d-0.sh.12531.clb.myqcloud.com"

此时,前文提到的正则表达式就无法适用这两种缺失的场景。

通过正则表达式中的组合的非捕获括号语法(?:x)以及匹配0次或1次?语法,既能兼容 upstream 和 referrer 缺失的场景,又能保证捕获括号提取的序号一致。

完整的正则表达式如下:

([^\[]+)\s([^,]+),\sclient:\s([^,]+),\sserver:\s([^,]+),\srequest:\s([^,]+)(?:,\supstream:\s([^,]+))?,\shost:\s([^,]+)(?:,\sreferrer:\s([^,]+))?

针对无 upstream 和 referrer 的日志进行测试,CLS 采集页面上的提取截图如下:

索引配置

开启索引,并配置全文和键值索引。

检索日志

构造测试日志,执行命令如下:

[root@VM_2_4_centos tools]# echo '2019/11/18 03:32:31 [error] 20803#0: *492368812 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: 191.12.201.78, server: run.sports.qq.com, request: "GET /999tst999?g_tk=1514204808&p_tk=OpWB6XOF96f2rlAApXgJE50ziHV596xlQ99lEenfZyY_ HTTP/1.1", upstream: "fastcgi://127.0.0.1:10000", host: "run.sports.qq.com", referrer: "\N"' >> /var/log/regex/regex.log
[root@VM_2_4_centos tools]# echo '2019/11/18 04:02:38 [error] 20802#0: *492391323 access forbidden by rule, client: 45.71.63.206, server: admin.sports.qq.com, request: "GET /index HTTP/1.1", host: "admin.sports.qq.com", referrer: "http://admin.sports.qq.com/index"' >> /var/log/regex/regex.log 
[root@VM_2_4_centos tools]# echo '2019/11/18 14:38:42 [error] 20803#0: *492866847 "/root/test/index.html" is forbidden (13: Permission denied), client: 118.79.20.201, server: -, request: "HEAD / HTTP/1.1", host: "451a9d-0.sh.12531.clb.myqcloud.com"' >> /var/log/regex/regex.log 

在日志服务控制台检索日志,如下所示。
检索关键词:referrer:admin*

检索关键词:upstream:fastcgi