flume-ng 是一个分布式,高可用的日志收集系统。主要用来将分布在不同服务器上的业务日志汇总在一个集中的数据存储中心
一 安装与环境配置
下载地址 http://flume.apache.org/download.html , 下载Apache Flume binary至目标服务器解压
运行环境java版本:Java 1.6 or later (Java 1.7 Recommended)
配置JAVA_HOME变量
将解压文件路径/bin配置加入环境变量
二 命令参数
Usage: /home/dongxiao.yang/apache-flume-1.4.0-bin/bin/flume-ng <command> [options]...
commands: help display this help text agent run a Flume agent avro-client run an avro Flume client version show Flume version info
global options: --conf,-c <conf> use configs in <conf> directory --classpath,-C <cp> append to the classpath --dryrun,-d do not actually start Flume, just print the command --plugins-path <dirs> colon-separated list of plugins.d directories. See the plugins.d section in the user guide for more details. Default: $FLUME_HOME/plugins.d -Dproperty=value sets a Java system property value -Xproperty=value sets a Java -X option
agent options: --conf-file,-f <file> specify a config file (required) --name,-n <name> the name of this agent (required) --help,-h display help text
avro-client options: --rpcProps,-P <file> RPC client properties file with server connection params --host,-H <host> hostname to which events will be sent --port,-p <port> port of the avro source --dirname <dir> directory to stream to avro source --filename,-F <file> text file to stream to avro source (default: std input) --headerFile,-R <file> File containing event headers as key/value pairs on each new line --help,-h display help text
Either --rpcProps or both --host and --port must be specified.
Note that if <conf> directory is specified, then it is always included first in the classpath.
配置文件简单例子
#define agent1.sources = source1 agent1.channels = channel1 agent1.sinks = sink1 sink2
#Describe the source agent1.sources.source1.type = exec agent1.sources.source1.command = tail -F /srv/apps/taskworker/log/taskworker.log agent1.sources.source1.interceptors=e1 agent1.sources.source1.interceptors.e1.type=timestamp
#Describe the sink agent1.sinks.sink1.type = avro agent1.sinks.sink1.hostname= 10.4.1.100 agent1.sinks.sink1.port = 10000
#Describe the channnel agent1.channels.channel1.type = file agent1.channels.channel1.checkpointDir = /home/dongxiao.yang/checkpoint agent1.channels.channel1.dataDirs = /home/dongxiao.yang/data
#Bind the source and sink to the channel agent1.sources.source1.channels = channel1 agent1.sinks.sink1.channel = channel1 agent1.sinks.sink2.channel = channel1
启动命令格式:
flume-ng agent --conf /home/dongxiao.yang/apache-flume-1.4.0-bin/conf/ --conf-file /home/dongxiao.yang/apache-flume-1.4.0-bin/conf/
--name agent1 -Dflume.root.logger=INFO,console -Duser.timezone=UTC
参考资料:http://flume.apache.org/FlumeUserGuide.html 官方文档
Apache Flume Distributed Log Collection for Hadoop.pdf 基于1.3版本,主要介绍了收集常见日志文件写入hdfs的几个结构