文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用Nagios监视日志文件？

问如何使用Nagios监视日志文件？
EN

Stack Overflow用户

提问于 2010-03-03 16:54:42

回答 6查看 78K关注 0票数 25

我们正在使用Nagios成功地监控我们的网络。但是，我们有一个重要应用程序错误的syslog，虽然我设置了check_log，但它似乎不能像监视设备那样工作。

这些问题是：

它只显示最后一个条目。
似乎没有办法承认关键错误并使监视器恢复到良好状态。

nagios是错误的工具，还是我们没有正确设置服务监视？

这是我的参赛作品

# log file
define command{
        command_name    check_log
        command_line    $USER1$/check_log -F /var/log/applications/appcrit.log -O /tmp/appcrit.log -q ?
}


# Define the log monitering service
define service{
        name                            logfile-check           ;
        use                             generic-service         ;
        check_period                    24x7                    ;
        max_check_attempts              1                       ;
        normal_check_interval           5                       ;
        retry_check_interval            1                       ;
        contact_groups                  admins                  ;
        notification_options            w,u,c,r                 ;
        notification_period             24x7                    ;
        register                        0                       ;
        }

define service{
        use                             logfile-check
        host_name                       localhost
        service_description             CritLogFile
        check_command                   check_log
}

logfiles

nagios

回答 6

Stack Overflow用户

回答已采纳

发布于 2015-01-26 10:14:33

由于有许多实现目标的方法，还可以从Consol提供一个很好的插件：日志文件/。

支撑正则表达式
支撑测井旋转

要使用它，需要一个cfg文件，这是oracle数据库的一个示例。

@searches = ({
  tag => 'oraalerts',
options => 'sticky=28800',
  logfile => '/u01/app/oracle/diag/rdbms/davmdkp/DAVMDKP1/trace/alert_DAVMDKP1.log',
  criticalpatterns => [
      'ORA\-0*204[^\d]',        # error in reading control file
      'ORA\-0*206[^\d]',        # error in writing control file
      'ORA\-0*210[^\d]',        # cannot open control file
      'ORA\-0*257[^\d]',        # archiver is stuck
      'ORA\-0*333[^\d]',        # redo log read error
      'ORA\-0*345[^\d]',        # redo log write error
      'ORA\-0*4[4-7][0-9][^\d]',# ORA-0440 - ORA-0485 background process failure
      'ORA\-0*48[0-5][^\d]',
      'ORA\-0*6[0-3][0-9][^\d]',# ORA-6000 - ORA-0639 internal errors
      'ORA\-0*1114[^\d]',        # datafile I/O write error
      'ORA\-0*1115[^\d]',        # datafile I/O read error
      'ORA\-0*1116[^\d]',        # cannot open datafile
      'ORA\-0*1118[^\d]',        # cannot add a data file
      'ORA\-0*1122[^\d]',       # database file 16 failed verification check
      'ORA\-0*1171[^\d]',       # datafile 16 going offline due to error advancing checkpoint
      'ORA\-0*1201[^\d]',       # file 16 header failed to write correctly
      'ORA\-0*1208[^\d]',       # data file is an old version - not accessing current version
      'ORA\-0*1578[^\d]',        # data block corruption
      'ORA\-0*1135[^\d]',        # file accessed for query is offline
      'ORA\-0*1547[^\d]',        # tablespace is full
      'ORA\-0*1555[^\d]',        # snapshot too old
      'ORA\-0*1562[^\d]',        # failed to extend rollback segment
      'ORA\-0*162[89][^\d]',     # ORA-1628 - ORA-1632 maximum extents exceeded
      'ORA\-0*163[0-2][^\d]',
      'ORA\-0*165[0-6][^\d]',    # ORA-1650 - ORA-1656 tablespace is full
      'ORA\-16014[^\d]',      # log cannot be archived, no available destinations
      'ORA\-16038[^\d]',      # log cannot be archived
      'ORA\-19502[^\d]',      # write error on datafile
      'ORA\-27063[^\d]',         # number of bytes read/written is incorrect
      'ORA\-0*4031[^\d]',        # out of shared memory.
      'No space left on device',
      'Archival Error',
  ],
  warningpatterns => [
      'ORA\-0*3113[^\d]',        # end of file on communication channel
      'ORA\-0*6501[^\d]',         # PL/SQL internal error
      'ORA\-0*1140[^\d]',         # follows WARNING: datafile #20 was not in online backup mode
      'Archival stopped, error occurred. Will continue retrying',
  ]
});

票数 3

Stack Overflow用户

发布于 2011-01-25 21:40:16

对于使用Nagios监视日志，日志检查器通常只在每次调用新发现的错误消息时才返回警告(因此，它必须保留某种状态，以便知道在后续运行时忽略这些错误消息)。因此，我通常认为：

max_check_attempts              1
is_volatile                     1

这导致Nagios不定期地发送警报，但只发送一次，然后恢复正常。

我最喜欢的日志检查器是日志警告，但是我有偏见，因为我是在没有找到任何我喜欢的现有日志检查器之后自己编写的。logwarn包包括一个Nagios插件。

票数 29

Stack Overflow用户

发布于 2010-03-03 20:33:28

您的配置中没有任何东西会因为配置错误而跳出我的位置。

根据设计，check_log将只显示一条OK消息或触发警报的最后一个日志条目。如果需要查看多个条目，则需要修改插件。

然而，我发现事实是，你没有得到恢复有点奇怪。check_log的工作方式(通过将当前日志与上一个版本进行比较)，您应该在下一个服务检查中获得一个恢复。当然，自上次检查以来，日志中添加了额外的匹配项。

强制进行另一个服务检查(或多个)是否会导致其恢复？

而且，我不打算用一种刻薄的方式，但要确保它真的有故障。您的日志是否在检查之间获得额外的匹配项，从而导致其无法恢复？你的支票是"?“这将匹配日志中的任何新内容。是否在日志中添加了其他内容(无错误)，并无意中导致匹配？

如果没有上述问题，我建议通过将Nagios从等式中去掉来缩小范围。尝试手动运行check_log (从命令行运行，但以与nagios相同的用户身份运行)，并使用不同的oldlog。应该是这样的-

使用新的"oldlog“运行检查-获取初始化消息
运行检查-检查确定
更改日志
运行检查-检查失败
运行检查-检查确定

如果这不起作用，那么您就应该关注日志、旧日志以及check_log是如何进行检查的。

如果它有效，那么它将更多地指向nagios配置中的一个问题。

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/2373212

复制

相似问题

问如何使用Nagios监视日志文件？
EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用Nagios监视日志文件？EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用Nagios监视日志文件？
EN