blocks|key|405268|text|AWStats是一个很好的日志解析器，开源，您可以随意使用它生成的数据库。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|405269|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

AWStats is a great log parser, open source, and you can do whatever you want with the resulting database that it generates.

blocks|key|2228129|text|在工作中，我们推出了自己的日志解析器(在Java中)，以便从生产日志中筛选出已知的堆栈跟踪，以确定新的潜在生产问题。它使用regex，并与我们的log4j日志格式紧密耦合。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2228130|当特定错误的计数太高时，我们还获得了一个python脚本，该脚本运行在实时生产事务日志和报告(+SiteScope+--我们的基础设施监视工具)上。|2228131|虽然这两种工具都很有用，但维护起来却很糟糕，我建议您先尝试任何开源工具解析工具，只有在必要时才使用自己的工具。见鬼，我甚至会花钱买一个这样做的工具;)|2228132|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|H|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|I|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|J|8|@]|9|@]|A|$]]|$1|F|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|G|$]]

At work we rolled our own log parser (in Java) so we could filter the known stacktraces out of the production logs to identify new potential production problems. It uses regex and it's tightly coupled to our log4j log format.

We've also got a python script that runs over the live production transaction logs and reports (to SiteScope - our infrastructure monitoring tool) when the count for particular errors is too high.

While both are useful, they are awful to maintain, and I would recommend trying any open source tool parsing tool first, and resorting to writing your own only if necessary. Heck, I would even pay for a tool that did this ;)

blocks|key|2228134|text|例如，您可以使用扫描仪和一些regexes。下面是我为解析一些复杂日志所做的工作的一个片段：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2228135|private+static+final+Pattern+LINE_PATTERN+=+Pattern.compile(
++"(\\S%2B:)?(\\S%2B?+\\S%2B?)+\\S%2B?+DEBUG+\\S%2B?+-+DEMANDE_ID=(\\d%2B?)+-+listener+(\\S%2B?)+:+(\\S%2B?)");

public+static+EventLog+parse(String+line)+throws+ParseException+{
++++String+demandId;
++++String+listenerClass;
++++long+startTime;
++++long+endTime;

++++SimpleDateFormat+sdf+=+new+SimpleDateFormat(DATE_PATTERN);
++++Matcher+matcher+=+LINE_PATTERN.matcher(line);
++++if+(matcher.matches())+{
++++++++int+offset+=+matcher.groupCount()-4;+//+4+interesting+groups,+the+first+is+optional
++++++++demandeId+=+matcher.group(2%2Boffset);
++++++++listenerClass+=+matcher.group(3%2Boffset);
++++++++long+time+=+sdf.parse(matcher.group(1%2Boffset)).getTime();
++++++++if+("starting".equals(matcher.group(4%2Boffset)))+{
++++++++++++startTime+=+time;
++++++++++++endTime+=+-1;
++++++++}+else+{
++++++++++++startTime+=+-1;
++++++++++++endTime+=+time;
++++++++}
++++++++return+new+EventLog(demandeId,+listenerClass,+startTime,+endTime);
++++}
++++return+null;
}|code-block|syntax|javascript|2228136|因此，对于regexes和group，它运行得很好。|2228137|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

You can use a Scanner for example, and some regexes. Here is a snippet of what I did to parse some complex logs :

<pre><code>private static final Pattern LINE_PATTERN = Pattern.compile(
 "(\\S+:)?(\\S+? \\S+?) \\S+? DEBUG \\S+? - DEMANDE_ID=(\\d+?) - listener (\\S+?) : (\\S+?)");

public static EventLog parse(String line) throws ParseException {
 String demandId;
 String listenerClass;
 long startTime;
 long endTime;

 SimpleDateFormat sdf = new SimpleDateFormat(DATE_PATTERN);
 Matcher matcher = LINE_PATTERN.matcher(line);
 if (matcher.matches()) {
 int offset = matcher.groupCount()-4; // 4 interesting groups, the first is optional
 demandeId = matcher.group(2+offset);
 listenerClass = matcher.group(3+offset);
 long time = sdf.parse(matcher.group(1+offset)).getTime();
 if ("starting".equals(matcher.group(4+offset))) {
 startTime = time;
 endTime = -1;
 } else {
 startTime = -1;
 endTime = time;
 }
 return new EventLog(demandeId, listenerClass, startTime, endTime);
 }
 return null;
}
</code></pre>

So, with regexes and groups, it works pretty well.

blocks|key|2228155|text|也许你可以写一个Log4j+CustomAppender？例如，如本文所述：http://mytechattempts.wordpress.com/2011/05/10/log4j-custom-memory-appender/|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|2228156|您可以使用JMX查询的数据库或简单Java对象来获取统计信息。这一切仅仅取决于需要持久化的数据数量。|2228157|entityMap|0|LINK|mutability|MUTABLE|url|http://mytechattempts.wordpress.com/2011/05/10/log4j-custom-memory-appender/^0|12|24|0|0|0^^$0|@$1|2|3|4|5|6|7|N|8|@]|9|@$A|O|B|P|1|Q]]|C|$]]|$1|D|3|E|5|6|7|R|8|@]|9|@]|C|$]]|$1|F|3|-4|5|6|7|S|8|@]|9|@]|C|$]]]|G|$H|$5|I|J|K|C|$L|M]]]]

Maybe you could write a Log4j CustomAppender? For example as described here: <a href="http://mytechattempts.wordpress.com/2011/05/10/log4j-custom-memory-appender/" rel="nofollow">http://mytechattempts.wordpress.com/2011/05/10/log4j-custom-memory-appender/</a>

Your custom appender could use a database or simple Java objects queried by JMX to get your statistics. All just depends on how much data is needed to be persisted.

blocks|key|2228191|text|如果您有这种可能性(而且您应该使用一个好的记录器框架)，我建议您以可解析的格式复制日志。例如，对于log4j，使用XMLLayout或类似的东西。解析起来要容易得多，因为这样您就会知道日志的确切格式。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2228192|您可以通过安装程序对运行中的应用程序进行非常透明的操作。考虑使用异步扩展器，以避免过多地干扰正在运行的应用程序。|2228193|此外，如果XMLLayout可以满足您的需要，请查看阿帕奇电锯|offset|length|2228194|entityMap|0|LINK|mutability|MUTABLE|url|http://logging.apache.org/chainsaw/index.html^0|0|0|Q|5|0|0^^$0|@$1|2|3|4|5|6|7|P|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|Q|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|R|8|@]|9|@$F|S|G|T|1|U]]|A|$]]|$1|H|3|-4|5|6|7|V|8|@]|9|@]|A|$]]]|I|$J|$5|K|L|M|A|$N|O]]]]

If you have the possibility (and you should with a good logger framework) I would recommend you to duplicate logs in a parsable format. For example, with log4j use an XMLLayout or something like this.
It will be a lot easier to parse because then you will know the exact format of the logs.

You can do this quite transparently to the running app just by setup. Think about using asynchronuous appender in order to not disturb too much the running application.

Also if the XMLLayout can suit your needs have a look at <a href="http://logging.apache.org/chainsaw/index.html" rel="nofollow">Apache chainsaw</a>

blocks|key|189407|text|Log4j的LogFilePatternReceiver就是这么做的..。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|189408|此日志条目:+17-11-2011+14:07:14错误MyXmlParser+-文件过早结束|189409|可以使用以下日志格式进行解析(假设原点与“记录器”相同)，可以使用dd:mm:ss的SimpleDateFormat时间戳进行解析。|189410|时间戳级别记录器-消息|189411|时区和其他形式的级别都有点小，tricker...there是将字符串重新映射到级别(E到ERROR)的能力，但我不知道时区是否会很好地工作。|189412|尝试一下，查看源代码，并在链锯的最新开发人员快照中对其进行支持：|189413|http://people.apache.org/~sdeboy|offset|length|189414|entityMap|0|LINK|mutability|MUTABLE|url^0|0|0|0|0|0|0|0|W|0|0^^$0|@$1|2|3|4|5|6|7|W|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|X|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|Y|8|@]|9|@]|A|$]]|$1|F|3|G|5|6|7|Z|8|@]|9|@]|A|$]]|$1|H|3|I|5|6|7|10|8|@]|9|@]|A|$]]|$1|J|3|K|5|6|7|11|8|@]|9|@]|A|$]]|$1|L|3|M|5|6|7|12|8|@]|9|@$N|13|O|14|1|15]]|A|$]]|$1|P|3|-4|5|6|7|16|8|@]|9|@]|A|$]]]|Q|$R|$5|S|T|U|A|$V|M]]]]

Log4j's LogFilePatternReceiver does exactly that...

This log entry:
17-11-2011 14:07:14 ERROR MyXmlParser - Premature end of file

Can be parsed using the following logformat (assuming origin is the same as 'logger'), with a timestamp leveraging Java's SimpleDateFormat of dd-MM-yyyy kk:mm:ss

TIMESTAMP LEVEL LOGGER - MESSAGE

The timezone and the level in the other form are a little tricker...there is the ability to remap strings to levels (E to ERROR) but I don't know that the timezone will quite work.

Try it out, check out the source, and play with support for it in the latest developer snapshot of Chainsaw:

<a href="http://people.apache.org/~sdeboy" rel="nofollow">http://people.apache.org/~sdeboy</a>

blocks|key|188901|text|最后，我没有编写自己的代码，也没有使用原木存放物。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|188902|entityMap|0|LINK|mutability|MUTABLE|url|http://logstash.net/^0|J|5|0|0^^$0|@$1|2|3|4|5|6|7|L|8|@]|9|@$A|M|B|N|1|O]]|C|$]]|$1|D|3|-4|5|6|7|P|8|@]|9|@]|C|$]]]|E|$F|$5|G|H|I|C|$J|K]]]]

I ended up not writing my own and using <a href="http://logstash.net/" rel="nofollow">logstash</a>.

We need to parse several log files and run some statistics on the logs entries found (things such as number of occurrence of certain messages, spikes of occurrences, etc). The problem is with writing a log parser that will handle several log formats and will allow me to add a new log format with very little work.

To make things easier for now I'm only looking at logs that will basically look similar to this:

<pre><code>[11/17/11 14:07:14:030 EST] MyXmlParser E Premature end of file
</code></pre>

so each log entry will contain a <code>timestamp</code>, <code>originator</code> (of the log message), <code>level</code> and log <code>message</code>. One important detail is that a message may have more than one line (e.g. stacktrace).
Another instance of the log entry could be:

<pre><code>17-11-2011 14:07:14 ERROR MyXmlParser - Premature end of file
</code></pre>

I'm looking for a good way to specify the log format as well as the most adequate technology to implement the parser for it.
I though about regular expressions but I think it will be tricky to handle situations such as the multi-line message (e.g. stacktrace).

Actually the task of writing a parser for a specific log format does not sound so easy itself when I consider the possibility of multi-line messages. How do you go about parsing those files?

Ideally I would be able to specify something like this as a log format:

<pre><code>[%TIMESTAMP] %ORIGIN %LEVEL %MESSAGE
</code></pre>

or

<pre><code>%TIMESTAMP %LEVEL %ORIGIN - %MESSAGE
</code></pre>

Obviously I would have to assign the right converter to each field to it would handle it correctly (e.g. the timestamp).

Could anyone give me some good ideas on how to implement this in a robust and modular way (I'm using Java) ?

How to write a Generic Log Parser

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我们需要解析几个日志文件，并对找到的日志条目运行一些统计信息(例如某些消息的出现次数、事件的峰值等)。问题在于编写一个日志解析器，它将处理几种日志格式，并允许我在很少工作的情况下添加一种新的日志格式。为了使事情变得更简单，我现在只看一些基本类似于以下内容的日志：[11/17/11 14:07:14:030 EST] M...

问如何编写通用日志分析器
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何编写通用日志分析器EN