整体架构比较清晰,首先是通过filebeat采集日志,通过redis充当一个缓存和消息队列的服务,随后logstash去接收,再做进一步的处理,过滤等操作,处理完成后,将日志传给es集群,最后通过kibana进行展示
1)系统设置
修改系统最大打开文件数
# tail /etc/security/limits.conf
* soft nofile 65536
* hard nofile 131072
* soft nproc 65536
* hard nproc 131072
临时生效
ulimit -n 65535
修改系统文件描述符
vim /etc/sysctl.conf
vm.max_map_count = 655360
sysctl -p 生效
内核参数优化参考
# cat /etc/sysctl.conf
# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
#关闭ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
# 避免放大攻击
net.ipv4.icmp_echo_ignore_broadcasts = 1
# 开启恶意icmp错误消息保护
net.ipv4.icmp_ignore_bogus_error_responses = 1
#关闭路由转发
net.ipv4.ip_forward = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
#开启反向路径过滤
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
#处理无源路由的包
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
#关闭sysrq功能
kernel.sysrq = 0
#core文件名中添加pid作为扩展名
kernel.core_uses_pid = 1
# 开启SYN洪水攻击保护
net.ipv4.tcp_syncookies = 1
#修改消息队列长度
kernel.msgmnb = 65536
kernel.msgmax = 65536
#设置最大内存共享段大小bytes
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
#timewait的数量,默认180000
net.ipv4.tcp_max_tw_buckets = 6000
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
#每个网络接口接收数据包的速率比内核处理这些包的速率快时,允许送到队列的数据包的最大数目
net.core.netdev_max_backlog = 262144
#限制仅仅是为了防止简单的DoS 攻击
net.ipv4.tcp_max_orphans = 3276800
#未收到客户端确认信息的连接请求的最大值
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_timestamps = 0
#内核放弃建立连接之前发送SYNACK 包的数量
net.ipv4.tcp_synack_retries = 1
#内核放弃建立连接之前发送SYN 包的数量
net.ipv4.tcp_syn_retries = 1
#启用timewait 快速回收
net.ipv4.tcp_tw_recycle = 1
#开启重用。允许将TIME-WAIT sockets 重新用于新的TCP 连接
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_fin_timeout = 2
#当keepalive 起用的时候,TCP 发送keepalive 消息的频度。缺省是2 小时
net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15
#允许系统打开的端口范围
net.ipv4.ip_local_port_range = 1024 65000
# 确保无人能修改路由表
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
vm.max_map_count = 655360
这里使用了三台机器部署es集群,首先准备好tar包,然后解压,创建用户,修改es目录权限等,如下:
tar zxvf elasticsearch-7.2.0-linux-x86_64.tar.gz
mv elasticsearch-7.2.0 elasticsearch
useradd admin
echo "123456" | passwd admin --stdin
chown -R admin.admin /export/servers/elasticsearch/
修改配置文件
# egrep -v "^$|^#" elasticsearch.yml
cluster.name: fgt_es-server
node.name: es3.fgt.com
path.data: /export/servers/elasticsearch/data
path.logs: /export/servers/elasticsearch/logs
network.host: es3.fgt.com
http.port: 9200
discovery.seed_hosts: ["es1.fgt.com", "es2.fgt.com","es3.fgt.com"]
cluster.initial_master_nodes: ["es1.fgt.com","es2.fgt.com"]
其他两台机器需要修改node.name和network.host,修改为自己机器对应的域名
elasticsearch文档参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
修改hosts文件
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.27.10.75 es1.fgt.com
172.27.10.76 es2.fgt.com
172.27.10.77 es3.fgt.com
三台机器都配置无误后,便可以启动了
./bin/elasticsearch -d -p pid
-d 放后台启动
停止
pkill -F pid
查看集群健康状态
curl 172.27.10.76:9200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1563269143 09:25:43 fgt_es-server green 3 3 84 42 0 0 0 0 - 100.0%
查看集群索引信息
# curl 172.27.10.76:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open yqj-javarisk01-2019.07.16 tJDuZaJJSKmzWdy-fp0Ufw 1 1 49115 0 328.7mb 165.2mb
green open yqj-app-2019.07.15 PYA5H5IdQoyY3zojgxbqCA 1 1 25402301 0 20.3gb 10.1gb
green open yqj-channel-manage-2019.07.16 80iqy70HSU-aUWwBY30h1w 1 1 701 0 484kb 228.7kb
green open .kibana_2 iApB0veGQKCvAALGrNiwqw 1 1 17 8 163.1kb 81.5kb
green open yql-settlement-2019.07.16 OR-a8icLRzCIFiqqbpqOxA 1 1 131023 0 134.3mb 63.9mb
green open yql-javarisk02-2019.07.13 fKZKMpm_QZCopM57UeieuA 1 1 587 0 2.1mb 1mb
green open yql-app-2019.07.15 WlwCiwX2Q4CKbSnCnGTmcQ 1 1 2414334 0 1.7gb 912.9mb
green open yqj-app-2019.07.16 hAJuW7CXRSiX9hxY7RadqA 1 1 10822137 0 11gb 6.2gb
green open yql-javarisk01-2019.07.13 aghXwJDgQpmgDiPEZYryuw 1 1 18 0 45.2kb 22.6kb
查看集群磁盘情况
# curl 172.27.10.76:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
28 16.7gb 22.2gb 74.7gb 96.9gb 22 es1.fgt.com 172.27.10.75 es1.fgt.com
28 11gb 15.4gb 81.5gb 96.9gb 15 es2.fgt.com 172.27.10.76 es2.fgt.com
28 7.7gb 12.4gb 84.5gb 96.9gb 12 es3.fgt.com 172.27.10.77 es3.fgt.com
查看集群节点
# curl 172.27.10.76:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.27.10.75 41 85 5 0.02 0.05 0.05 mdi - es1.fgt.com
172.27.10.76 54 97 2 0.08 0.03 0.05 mdi * es2.fgt.com
172.27.10.77 59 89 2 0.01 0.04 0.11 mdi - es3.fgt.com
步骤同es的部署,我们直接看下配置文件
cat filebeat_redis_logstash.conf
input {
redis {
data_type => "list"
key => "yql-app-key"
host => "redis.fgt.com"
port => 9998
db => 0
threads => 4
type => "yql-app"
}
redis {
data_type => "list"
key => "yql-app-manage-key"
host => "redis.fgt.com"
port => 9998
db => 1
threads => 4
type => "yql-app-manage"
}
}
filter {
if [type] == "yqj-app" {
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:Datetime}\|(?<Loglevel>(.{5}))\|(?<Thread>([a-zA-Z0-9-]{1,}))\|%{IP:Ipaddress}\|%{URIPATH:Url}\|(?<Platform>(.*))\|(?<Channel>(.*))\|(?<ClientVersion>(.*))\|(?<DeviceCode>(\d+))\|(?<Model>(.*))\|(?<OSVersion>([a-zA-Z0-9.]{1,}))\|(?<UserToken>(.*))\|(?<Token>(.*))\|%{BASE10NUM:SerialNumber}\|(?<Logger>(.*))\|(?<Msg>(.*))"
}
remove_field => [ "message", "@version" ]
}
}
else if [type] == "yqj-app-manage" {
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:Datetime}\|(?<Loglevel>(.{5}))\|(?<Thread>([a-zA-Z0-9_-]{1,}))\|(?<Ipaddress>(.*))\|(?<Url>(.*))\|(?<Platform>(.*))\|(?<Channel>(.*))\|(?<ClientVersion>(.*))\|(?<DeviceCode>(.*))\|(?<Model>(.*))\|(?<None>(.*))\|(?<UserToken>(.*))\|(?<Token>(.*))\|%{BASE10NUM:SerialNumber}|((.*))\|(?<Logger>(.*))\|(?<Msg>(.*))"
}
remove_field => [ "message", "@version" ]
}
}
}
output {
elasticsearch {
hosts => ["es1.fgt.com:9200", "es2.fgt.com:9200", "es3.fgt.com:9200"]
index => "%{type}-%{+YYYY.MM.dd}"
sniffing => true
template_overwrite => true
}
stdout {
codec => "rubydebug"
}
}
启动方式
./bin/logstash -f $logstashdir/config/filebeat_redis_logstash.conf --config.reload.automatic
–config.reload.automatic 动态读取配置文件
这里需要通过grok配置项,写正则来匹配日志,配置可参考官方:https://www.elastic.co/guide/en/logstash/current/configuration.html
# egrep -v "^#|^$" kibana.yml
server.port: 5601
server.host: "172.27.10.79"
elasticsearch.hosts: ["http://es1.fgt.com:9200", "http://es2.fgt.com:9200", "http://es3.fgt.com:9200"]
logging.dest: stdout
i18n.locale: "zh-CN"
启动方式
./bin/kibana