基于lnmp环境的小说平台的搭建与维护

dogfei

发布于 2020-07-31 14:38:41

7280

发布于 2020-07-31 14:38:41

文章被收录于专栏：devops探索

小说架构图-ca44c1c2-7cab-441f-bbd3-0c82aee2e3f2

机器列表

前端：

1	192.168.111.25/192.168.111.26

这两台机器实际上目前请求的只有一台机器，由于阿里云有限制，不能使用keepalived来做nginx的高可用，所以所有的请求均由其中一台机器承担，这台机器目前是由192.168.111.26来工作，上面有nginx，php，redis等服务

后端：

1	192.168.111.27/192.168.111.28

这两台机器的配置相对前端的配置较差，但是磁盘较大，所以用来跑mysql，而这两台机器也互为主主

由于目前程序框架的限制，mysql的读写分离是由atlas中间件来实现的，这个服务目前是在192.168.111.26上来跑，也就是说所有的查询数据库的操作均会请求192.168.111.26这台机器上的atlas服务，默认请求地址为127.0.0.1，默认端口为1234，默认管理端口为2345.

然后就是对整体架构的一个后期维护工作，包括数据的备份，代码的上下线，服务和系统性能的监控与报警等

了解了整体的架构之后，现在开始进行实施！

lnmp环境的搭建

这里我们使用了自动安装脚本来实现的，详细细节可以参考另一篇文章

MySQL主主复制与读写分离的实现

两台机器A与B

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

修改A的配置文件： binlog_format=mixed server-id = 1 relay-log=relay-bin relay-log-index=slave-relay-bin.index binlog-ignore-db = mysql,information_schema auto-increment-increment=2 auto-increment-offset=1 slave-skip-errors = all 修改B的配置文件： binlog_format=mixed server-id = 2 relay-log=relay-bin relay-log-index=slave-relay-bin.index binlog-ignore-db = mysql,information_schema auto-increment-increment=2 auto-increment-offset=2 slave-skip-errors = all

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

分别在两台机器上授权： mysql> grant replication slave on *.* to 'xs'@'B' identified by '123456'; mysql> grant replication slave on *.* to 'xs'@'A' identified by '123456'; 在执行这个操作前需要先分别在两台机器上执行show master status;查看下二进制日志的位置和名称 mysql> change master to master_host='A',master_user='xs',master_password='123456',master_log_file='mysql-bin.000001',master_log_pos=222; mysql> change master to master_host='B',master_user='xs',master_password='123456',master_log_file='mysql-bin.000001',master_log_pos=223; 最后是开启slave功能,两台机器都要开启！ mysql> start slave; 验证： mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: A或者B Master_User: xs Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000055 Read_Master_Log_Pos: 184398434 Relay_Log_File: relay-bin.000127 Relay_Log_Pos: 184398597 Relay_Master_Log_File: mysql-bin.000055 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table:

注意

如果 Slave_IO_Running: connecting，处于这个状态的时候，有可能是防火墙导致，需要将对方的IP放通。

验证主从同步的方式很简单，可以在主库建库建表，看另一台主库是否同步即可，下面是实现主从分离，使用的是一个中间间，Atlas

Atlas简单使用及介绍

Atlas是由 Qihoo 360公司Web平台部基础架构团队开发维护的一个基于MySQL协议的数据中间层项目。它在MySQL官方推出的MySQL-Proxy 0.8.2版本的基础上，修改了大量bug，添加了很多功能特性。目前该项目在360公司内部得到了广泛应用，很多MySQL业务已经接入了Atlas平台，每天承载的读写请求数达几十亿条。同时，有超过50家公司在生产环境中部署了Atlas，超过800人已加入了我们的开发者交流群，并且这些数字还在不断增加。

主要功能：

读写分离
从库负载均衡
IP过滤
自动分表
DBA可平滑上下线DB
自动摘除宕机的DB

Atlas相对于官方MySQL-proxy的优势：

将主流程中所有Lua代码用C重写，Lua仅用于管理接口
重写网络模型、线程模型
实现了真正意义上的连接池
优化了锁机制，性能提高数十倍

atlas安装

从官方下载rpm包，然后安装即可（atlas只能安装在64位系统上）

以下是配置文件参考

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77

###1.配置范例及说明如下： [mysql-proxy] (必备，默认值即可)管理接口的用户名 admin-username = user (必备，默认值即可)管理接口的密码 admin-password = pwd (必备，根据实际情况配置)主库的IP和端口 proxy-backend-addresses = 192.168.0.12:3306 (非必备，根据实际情况配置)从库的IP和端口，@后面的数字代表权重，用来作负载均衡，若省略则默认为1，可设置多项，用逗号分隔。如果想让主库也能分担读请求的话，只需要将主库信息加入到下面的配置项中。 proxy-read-only-backend-addresses = 192.168.0.13:3306,192.168.0.14:3306 (必备，根据实际情况配置)用户名与其对应的加密过的MySQL密码，密码使用PREFIX/bin目录下的加密程序encrypt加密，用户名与密码之间用冒号分隔。主从数据库上需要先创建该用户并设置密码（用户名和密码在主从数据库上要一致）。比如用户名为myuser，密码为mypwd，执行./encrypt mypwd结果为HJBoxfRsjeI=。如果有多个用户用逗号分隔即可。则设置如下行所示： pwds = myuser: HJBoxfRsjeI=,myuser2:HJBoxfRsjeI= （必备，默认值即可)Atlas的运行方式，设为true时为守护进程方式，设为false时为前台方式，一般开发调试时设为false，线上运行时设为true daemon = true (必备，默认值即可)设置Atlas的运行方式，设为true时Atlas会启动两个进程，一个为monitor，一个为worker，monitor在worker意外退出后会自动将其重启，设为false时只有worker，没有monitor，一般开发调试时设为false，线上运行时设为true keepalive = true (必备，根据实际情况配置)工作线程数，推荐设置成系统的CPU核数的2至4倍 event-threads = 4 (必备，默认值即可)日志级别，分为message、warning、critical、error、debug五个级别 log-level = message (必备，默认值即可)日志存放的路径 log-path = /usr/local/mysql-proxy/log (必备，根据实际情况配置)SQL日志的开关，可设置为OFF、ON、REALTIME，OFF代表不记录SQL日志，ON代表记录SQL日志，该模式下日志刷新是基于缓冲区的，当日志填满缓冲区后，才将日志信息刷到磁盘。REALTIME用于调试，代表记录SQL日志且实时写入磁盘，默认为OFF sql-log = OFF (可选项，可不设置）慢日志输出设置。当设置了该参数时，则日志只输出执行时间超过sql-log-slow（单位：ms)的日志记录。不设置该参数则输出全部日志。 sql-log-slow = 10 (可选项，可不设置）关闭不活跃的客户端连接设置。当设置了该参数时，Atlas会主动关闭经过'wait-timeout'时间后一直未活跃的连接。单位：秒 wait-timeout = 10 (必备，默认值即可)Atlas监听的工作接口IP和端口 proxy-address = 0.0.0.0:1234 (必备，默认值即可)Atlas监听的管理接口IP和端口 admin-address = 0.0.0.0:2345 (可选项，可不设置)分表设置，此例中person为库名，mt为表名，id为分表字段，3为子表数量，可设置多项，以逗号分隔，若不分表则不需要设置该项，子表需要事先建好，子表名称为表名_数字，数字范围为[0,子表数-1]，如本例里，子表名称为mt_0、mt_1、mt_2 tables = person.mt.id.3 (可选项，可不设置)默认字符集，若不设置该项，则默认字符集为latin1 charset = utf8 (可选项，可不设置)允许连接Atlas的客户端的IP，可以是精确IP，也可以是IP段，以逗号分隔，若不设置该项则允许所有IP连接，否则只允许列表中的IP连接 client-ips = 127.0.0.1, 192.168.1 (可选项，极少需要)Atlas前面挂接的LVS的物理网卡的IP(注意不是虚IP)，若有LVS且设置了client-ips则此项必须设置，否则可以不设置 lvs-ips = 192.168.1.1

启动方式

1	./mysql-proxyd test start\|stop\|restart

连接方式

1 2 3 4	管理模式： mysql -uxs -h127.0.0.1 -p123456 -P 2345 工作模式： mysql -uxs -h127.0.0.1 -p123456 -P 1234

使用mysqlreport

进行mysql的配置优化，首先需要找到瓶颈所在，执行show status命令所得的数据可以用来参考优化，而mysqlreport则可以把show status所得的数据进行融合计算，整理成一个优化的参考表，从而进行优化调整。在本例中，根据报告表，优化了query_cache_size、max_heap_table_size、innodb_buffer_pool_size、max_connections等参数。

将包下载下来之后直接解压即可使用，参数介绍：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Command line options (abbreviations work): --user USER Connect to MySQL as USER --password PASS Use PASS or prompt for MySQL user's password --host ADDRESS Connect to MySQL at ADDRESS --port PORT Connect to MySQL at PORT --socket SOCKET Connect to MySQL at SOCKET --no-mycnf Don't read ~/.my.cnf --infile FILE Read status values from FILE instead of MySQL --outfile FILE Write report to FILE --email ADDRESS Email report to ADDRESS (doesn't work on Windows) --flush-status Issue FLUSH STATUS; after getting current values --relative X Generate relative reports. If X is an integer, reports are live from the MySQL server X seconds apart. If X is a list of infiles (file1 file2 etc.), reports are generated from the infiles in the order that they are given. --report-count N Collect N number of live relative reports (default 1) --detach Fork and detach from terminal (run in background) --help Prints this --debug Print debugging information

根据文档介绍，我们执行下看下性能指标

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118

__ Key _________________________________________________________________ Buffer used 251.00k of 128.00M %Used: 0.19 Current 23.59M %Usage: 18.43 Write hit 0.04% Read hit 99.99% __ Questions ___________________________________________________________ Total 141.91M 140.5/s Com_ 91.88M 91.0/s %Total: 64.75 QC Hits 46.13M 45.7/s 32.50 DMS 33.02M 32.7/s 23.27 -Unknown 29.30M 29.0/s 20.65 COM_QUIT 186.04k 0.2/s 0.13 Slow 10 s 21 0.0/s 0.00 %DMS: 0.00 Log: DMS 33.02M 32.7/s 23.27 SELECT 23.27M 23.0/s 16.40 70.47 UPDATE 6.26M 6.2/s 4.41 18.95 INSERT 3.49M 3.5/s 2.46 10.58 DELETE 501 0.0/s 0.00 0.00 REPLACE 0 0/s 0.00 0.00 Com_ 91.88M 91.0/s 64.75 change_db 70.14M 69.4/s 49.43 begin 9.75M 9.7/s 6.87 commit 9.75M 9.7/s 6.87 __ SELECT and Sort _____________________________________________________ Scan 9.05M 9.0/s %SELECT: 38.90 Range 609.81k 0.6/s 2.62 Full join 1 0.0/s 0.00 Range check 0 0/s 0.00 Full rng join 0 0/s 0.00 Sort scan 3.33M 3.3/s Sort range 325.67k 0.3/s Sort mrg pass 0 0/s __ Query Cache _________________________________________________________ Memory usage 150.03M of 256.00M %Used: 58.60 Block Fragmnt 19.31% Hits 46.13M 45.7/s Inserts 19.85M 19.6/s Insrt:Prune 4.26:1 15.0/s Hit:Insert 2.32:1 __ Table Locks _________________________________________________________ Waited 1 0.0/s %Total: 0.00 Immediate 32.78M 32.4/s __ Tables ______________________________________________________________ Open 258 of 1024 %Cache: 25.20 Opened 269 0.0/s __ Connections _________________________________________________________ Max used 144 of 1024 %Max: 1.41 Total 186.56k 0.2/s __ Created Temp ________________________________________________________ Disk table 723 0.0/s Table 3.16M 3.1/s Size: 256.0M File 12 0.0/s __ Threads _____________________________________________________________ Running 3 of 45 Cached 76 of 120 %Hit: 99.92 Created 144 0.0/s Slow 0 0/s __ Aborted _____________________________________________________________ Clients 1.04k 0.0/s Connects 67 0.0/s __ Bytes _______________________________________________________________ Sent 113.60G 112.5k/s Received 18.96G 18.8k/s __ InnoDB Buffer Pool __________________________________________________ Usage 895.88M of 1.00G %Used: 87.50 Read hit 100.00% Pages Free 8.19k %Total: 12.50 Data 53.01k 80.90 %Drty: 0.00 Misc 4322 6.60 Latched 0.00 Reads 51.19G 50.7k/s From file 1.98M 2.0/s 0.00 Ahead Rnd 0 0/s Ahead Sql 0/s Writes 66.98M 66.3/s Flushes 26.34M 26.1/s Wait Free 0 0/s __ InnoDB Lock _________________________________________________________ Waits 0 0/s Current 0 Time acquiring Total 0 ms Average 0 ms Max 0 ms __ InnoDB Data, Pages, Rows ____________________________________________ Data Reads 2.38M 2.4/s Writes 44.20M 43.8/s fsync 32.78M 32.4/s Pending Reads 0 Writes 0 fsync 0 Pages Created 165.67k 0.2/s Read 2.38M 2.4/s Written 26.34M 26.1/s Rows Deleted 100.36k 0.1/s Inserted 3.47M 3.4/s Read 321.99G 318.7k/s Updated 3.42M 3.4/s

具体各项参数可以自行百度或谷歌

数据库的备份与还原

数据库主要是指mysql的数据备份，mysql数据备份有两种备份方案，一个是全量备份，另一个是增量备份，全量备份目前是每周一凌晨4点备份，增量备份是每天备份。

备份工具有两种，一个是使用mysqldump进行备份，另外一个是使用xtrabackup备份

这里介绍mysqldump，xtrabackup可以参考我的博客

参考脚本：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76

#!/bin/bash #backup mysql #Path=`which python` export LANG=en_US.UTF-8 ############数据目录############ BakDir=/backup_sql_55/mysql_data/backup BinDir=/usr/local/mysql/data #########脚本执行结果输出日志目录################ LogFile=/backup_sql_55/mysql_data/log/binlog.log LOGFILE=/backup_sql_55/mysql_data/log/bak.log ######索引文件############ BinFile=/usr/local/mysql/data/mysql-bin.index DATE=`date +%Y%m%d` TIME=$(date +%Y-%m-%d-%H%M) test -d $BakDir || mkdir -p $BakDir day_back() { mysqladmin -uroot -hlocalhost -pclickwise10050 flush-logs Counter=`wc -l $BinFile |awk '{print $1}'` NextNum=0 for file in `cat $BinFile` do base=`basename $file` NextNum=`expr $NextNum + 1` if [ $NextNum -eq $Counter ] then echo $base skip! >> $LogFile else dest=$BakDir/$base if (test -e $dest) then echo $base exist! >> $LogFile else cp $BinDir/$base $BakDir cd $BakDir tar zcf ${TIME}.tgz $base && rm -f $base echo $base copying >> $LogFile fi fi done echo `date +"%Y年%m月%d日 %H:%M:%S"` Bakup successful! >> $LogFile } ###############################全量备份#################################### all_back() { mysqldump -uroot -hlocalhost -pclickwise10050 --databases novel --flush-logs | gzip > $BakDir/novel_${DATE}.sql.gz if [ $? -eq 0 ];then echo "$DATE:SUCCESSFUL" >> $BakDir/bak.log else echo "$DATE:FAIL" >> $BakDir/bak.log echo "mysql backup is fail" | mail -s "warnning" chenfei_123zz@163.com chenfei@clickwise.cn fi } ################################删除文件######################### del_back() { cd /data/log/mysql find . -type f -name "mysql-bin.0*" -ctime +30 | xargs rm -f } case $1 in day_back) day_back ;; all_back) all_back ;; del_back) del_back ;; *) echo "pls input (day_bak|all_bak|del_bak)" esac

数据还原

我们可以使用命令：

1 2 3	mysql -uxs -p123456 dbname < mysql-bin.000001 或 mysql> source mysql-bin.000001

各项服务的监控与报警

监控主要是监控各项服务的存活情况，监控系统资源使用情况。

采用zabbix+脚本来配合使用

zabbix_agent安装这里就不介绍，安装完成之后需要修改两个地方：

1 2 3 4 5 6	#vim zabbix_agentd.conf Server=192.168.111.11 zabbix_server主机地址 ServerActive=192.168.111.11 zabbix_server主机地址 Hostname=192.168.111.22 本机地址 Include=/usr/local/zabbix_agent/etc/zabbix_agentd.conf.d/*.conf UnsafeUserParameters=1

通过设置自定义key值来对获取服务状态，从而达到监控各项服务的效果

监控nginx

写一个取key值得脚本

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

#!/bin/bash # host="localhost" port="8080" statusurl="/nginx_status" active() { curl -s http://${host}:${port}${statusurl} | awk '/^Active/{print $3}' } accepts() { curl -s http://${host}:${port}${statusurl} | awk 'NR==3{print $1}' } handled() { curl -s http://${host}:${port}${statusurl} | awk 'NR==3{print $2}' } requests() { curl -s http://${host}:${port}${statusurl} | awk 'NR==3{print $3}' } reading() { curl -s http://${host}:${port}${statusurl} | awk 'NR==4{print $2}' } writing() { curl -s http://${host}:${port}${statusurl} | awk 'NR==4{print $4}' } waiting() { curl -s http://${host}:${port}${statusurl} | awk 'NR==4{print $6}' } $1

修改zabbix_agent的子配置文件，现在所有的子配置文件均在zabbix_agentd.conf.d这个目录下

1 2	[root@xs_82_208]# cat user_parameter_nginx.conf UserParameter=nginx.status[*],/usr/local/zabbix_agent/<span style="color: #ff0000;">script</span>/monitor_nginx.sh $1

配置完成后reload一下zabbix，然后在zabbix_server上测试下能否取到值，方式：

1 2	[root@iZ2zbin]# ./zabbix_get -s 192.168.111.11 -p 10050 -k "nginx.status[accepts]" 248700

zabbix上的配置操作可以参考博客：[zabbix详解][3]

zabbix操作完成后，我们还要通过脚本来监控nginx和php的运行情况，这里我们使用了nginx和php的内置模块进行监控，只需修改配置文件即可

1 2 3 4 5 6

[root@xs_82_208 /usr/local/nginx/conf/conf.d]# nginx -V nginx version: xs/1.12.0 built by gcc 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) built with OpenSSL 1.0.2k-fips 26 Jan 2017 TLS SNI support enabled configure arguments: --prefix=/usr/local/nginx --sbin-path=/usr/local/nginx/sbin/nginx --modules-path=/usr/local/nginx/modules --conf-path=/usr/local/nginx/conf/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/usr/local/nginx/tmp/client_body --http-proxy-temp-path=/usr/local/nginx/tmp/proxy --http-fastcgi-temp-path=/usr/local/nginx/tmp/fastcgi --http-uwsgi-temp-path=/usr/local/nginx/tmp/uwsgi --http-scgi-temp-path=/usr/local/nginx/tmp/scgi --pid-path=/usr/local/nginx/run/nginx.pid --lock-path=/usr/local/nginx/run/lock/nginx --user=nginx --group=nginx --with-file-aio --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_geoip_module=dynamic --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_slice_module <span style="color: #ff0000;">--with-http_stub_status_module</span> --with-http_perl_module=dynamic --with-mail=dynamic --with-mail_ssl_module --with-pcre --with-pcre-jit --with-stream=dynamic --with-stream_ssl_module --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' --with-ld-opt=' -Wl,-E'

看到–with-http_stub_status_module这个模块说明已经装上了

php-fpm配置文件中加入pm.status_path = /php_status

nginx配置文件中加入

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19	#检测nginx状态 location ~ /nginx_status { stub_status on; access_log off; allow 127.0.0.1; allow 172.17.24.139; deny all; } #检测php状态 location ~ /php_status { allow 127.0.0.1; deny all; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; fastcgi_pass 127.0.0.1:9000; }

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

[root@xs_test_bk conf.d]# curl -I 127.0.0.1/nginx_status HTTP/1.1 301 Moved Permanently Server: xs Date: Thu, 08 Feb 2018 11:16:22 GMT Content-Type: text/html Content-Length: 175 Connection: keep-alive Location: https://xs.bfnet.cn/nginx_status [root@xs_test_bk conf.d]# curl -I 127.0.0.1/php_status HTTP/1.1 301 Moved Permanently Server: xs Date: Thu, 08 Feb 2018 11:16:34 GMT Content-Type: text/html Content-Length: 175 Connection: keep-alive Location: https://xs.bfnet.cn/php_status

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16	#!/bin/bash py=python T=`date "+%Y%m%d%H%M"` mail=chenfei@clickwise.cn curl -I 127.0.0.1:8080/chk_php.php &>/dev/null if [ $? -ne 0 ];then curl -I 127.0.0.1:8080/nginx_status &>/dev/null if [ $? -ne 0 ];then /etc/init.d/nginx restart $py send.py "$mail" "warnning" "nginx is restart" else /etc/init.d/php-fpm restart $py send.py "$mail" "warnning" "php is restart" fi fi

监控系统内存和php-fpm占用内存

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

#!/bin/bash py=python mail=test@163.com Memory() { mem=$(free -m | grep Mem | awk '{printf ("%.f\n"),$3/$2*100}') if [ $mem -gt 75 ];then php_pro=$(ps auxf | grep php-fpm |grep -v grep | wc -l) if [ $php_pro -lt 31 ];then $py send.py "$mail" "mem is high" "now is $mem%" else /etc/init.d/php-fpm restart [ `free -m | grep Mem | awk '{printf ("%.f\n"),$3/$2*100}'` -lt 75 ] && $py send.py "$mail" "mem is recovery" "now is $mem%" fi fi } MemoryPhpUsed() { count=2 sum=$(top -b -n 1 | grep php-fpm | awk '{sum+=$10}END{printf ("%.f"),sum/100 * 7822}') while [ $count -gt 0 ] do if [ $sum -gt 2500 ];then /etc/init.d/php-fpm restart $py send.py "$mail" "php info" "php now is restarting,used memory is $sum" sum2=$(top -b -n 1 | grep php-fpm | awk '{sum+=$10}END{printf ("%.f"),sum/100 * 7822}') if [ $sum2 -lt $sum ];then echo "restart successful" let "count--" else $py send.py "$mail" "php warnning" "php use memory too high" let "count--" fi else let "count--" fi done } case $1 in mem) Memory ;; php_mem) MemoryPhpUsed ;; esac

这里使用了top来获取php-fpm进程的占用内存情况，有一个小技巧，就是把top的输出进行重定向

关于top可以参考：top详解

报警程序

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

#!/usr/bin/python # coding=utf-8 import smtplib import sys from email.mime.text import MIMEText class send_mail(): mail_host="smtp.163.com" mail_user="test@163.com" mail_pass="test" mail_postfix="163.com" def __init__(self,sub,content): self.me="陈菲"+"<"+self.mail_user+"@"+self.mail_postfix+">" self.msg = MIMEText(content,_subtype='html',_charset='utf-8') self.msg['From'] = self.me self.msg['Subject'] = sub def sendMessage(self,user): self.msg['To'] = ";".join(user) try: s = smtplib.SMTP_SSL(host=self.mail_host,port=465) s.login(self.mail_user,self.mail_pass) s.sendmail(self.me, user, self.msg.as_string()) #发送邮件 s.close() except Exception, e: print str(e) if __name__ == '__main__': mailto_list=[] mailto_list.append(sys.argv[1]) s = send_mail(sys.argv[2],sys.argv[3]) s.sendMessage(mailto_list)

监控MySQL

设置自定义key值，并获取值

1 2	[root@xs_88_55 /usr/local/zabbix_agent/etc/zabbix_agentd.conf.d]# cat userparameter_mysql.conf UserParameter=mysql.status[*],/usr/local/zabbix_agent/script/monitor_mysql.sh $1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

#!/bin/bash MYSQLADMIN_BIN="/usr/local/mysql/bin/mysqladmin" MYSQL_BIN="/usr/local/mysql/bin/mysql" MYSQL_FILE="/etc/my.cnf" case $1 in #system Status) tmp1=`${MYSQLADMIN_BIN} --defaults-file=${MYSQL_FILE} ping | grep -c alive` [[ $tmp1 == "1" ]] && echo "1" || echo "0" ;; #slave Slave_IO_Running) tmp1=`${MYSQL_BIN} --defaults-file=${MYSQL_FILE} -e "show slave status\G" | grep Slave_IO_Running | awk '{printf $2}'` [[ $tmp1 == "Yes" ]] && echo "1" || echo "0" ;; Slave_SQL_Running) tmp1=`${MYSQL_BIN} --defaults-file=${MYSQL_FILE} -e "show slave status\G" | grep Slave_SQL_Running: | awk '{printf $2}'` [[ $tmp1 == "Yes" ]] && echo "1" || echo "0" ;; #1表示正常，0表示错误 Seconds_Behind_Master) tmp1=`${MYSQL_BIN} --defaults-file=${MYSQL_FILE} -e "show slave status\G" | grep Seconds_Behind_Master | awk '{printf $2}'` [[ $tmp1 == "0" ]] && echo "1" || echo "0" ;; Last_IO_Errno) tmp1=`${MYSQL_BIN} --defaults-file=${MYSQL_FILE} -e "show slave status\G" | grep Last_IO_Errno | awk '{printf $2}'` [[ $tmp1 == "0" ]] && echo "1" || echo "0" ;; Last_SQL_Errno) tmp1=`${MYSQL_BIN} --defaults-file=${MYSQL_FILE} -e "show slave status\G" | grep Last_SQL_Errno | awk '{printf $2}'` [[ $tmp1 == "0" ]] && echo "1" || echo "0" ;; Slave_IO_State) tmp1=`${MYSQL_BIN} --defaults-file=${MYSQL_FILE} -e "show slave status\G" | grep Slave_IO_State | awk '{printf $2}'` [[ $tmp1 == "Waiting" ]] && echo "1" || echo "0" ;; #query #每秒查询数，类型：float #Queries per second avg: 0.000 Questions) tmp1=`${MYSQLADMIN_BIN} --defaults-file=${MYSQL_FILE} status | awk '{printf $22}'` echo "$tmp1" ;; #慢查询 #Slow queries: 0 累加，类型：uint Slow_queries) tmp1=`${MYSQLADMIN_BIN} --defaults-file=${MYSQL_FILE} status | awk '{printf $9}'` echo "$tmp1" ;; esac

程序优化与调试

php优化与调试

1、php开启opcache

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

[opcache] ;Zend Optimizer + 的开关，关闭时代码不再优化 opcache.enable=1 opcache.enable_cli=1 ；开启这个会把opcode缓存到外部文件中，对于一些脚本有明显的性能提升 opcache.huge_code_pages=1 opcache.file_cache=/home/tmp zend_extension=opcache.so ；Zend Optimizer +共享内存的大小，总共能存储多少预编译的PHP代码 opcache.memory_consumption=128 ；Zend Optimizer +暂存池中字符串的占内存总量 opcache.interned_strings_buffer=8 ；最大的缓存的文件数目200到100000之间 opcache.max_accelerated_files=4000 ；内存浪费达到此值对应的百分比，就会发起一个重启调度 opcache.max_wasted_precentage=5 ；开启这条指令，Zend Optimizer +会自动将当前工作目录的名字追加到脚本键上，以此消除同名文件间的键值命名冲突，关闭这条指令会提升性能，但是会对已存在的应用造成破坏 opcache.use_cwd=0 ;打开快速关闭，打开这个在PHP Request Shutdown的时候回收内存的速度会提高 opcache.fast_shutdown=1 ；60s检查以此文件更新，0是一直检查 opcache.revalidate_freq=60

2、更改内核参数之Hugepage

通过启用或者设置这个参数，PHP会把自身的TEXT段（执行体）挪到Huagepage上，网站的QPS会得到提升

我们默认的内存是以4KB分页的，而虚拟地址和内存地址是需要转换的，而这个转换是要查表的，CPU为了加速这个查表过程会内建TLB（Translation Lookaside Buffer），所以，如果虚拟页越小，表里的条目数也就越多，而TLB大小是有限的，条数越多，TLB的Cache Miss也就会越高，所以如果我们能启用大内存页就能间接降低TLB Cache Miss，从而提高性能

查看内存信息：

1 2 3 4 5 6 7	[root@xs]# cat /proc/meminfo \| grep -i huge AnonHugePages: 163840 kB HugePages_Total: 345 HugePages_Free: 330 HugePages_Rsvd: 54 HugePages_Surp: 0 Hugepagesize: 2048 kB

这里的Hugepage的size是2MB，当前并没有启用HugePages

我们通过opcache来启用这个特性，通过设置：

1	opcache.huge_code_pages=1

设置好php的opcache之后，我们分配一些Hugepages：

1	sudo sysctl vm.nr_hugepages=128

再来查看写内存信息：

1 2 3 4 5 6 7	[root@xs]# cat /proc/meminfo \| grep -i huge AnonHugePages: 169984 kB HugePages_Total: 128 HugePages_Free: 113 HugePages_Rsvd: 54 HugePages_Surp: 0 Hugepagesize: 2048 kB

可以看到已经设置正确，完成之后，要重启下php-fpm

如果php运行了一段时间后，日志里如果提示

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

[07-Feb-2018 22:19:20] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 id le, and 39 total children [07-Feb-2018 22:19:46] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 id le, and 47 total children [07-Feb-2018 22:19:47] WARNING: [pool www] server reached pm.max_children setting (50), consider raising it [07-Feb-2018 22:20:24] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 id le, and 47 total children [08-Feb-2018 07:15:08] NOTICE: Terminating ... [08-Feb-2018 07:15:08] NOTICE: exiting, bye-bye! [08-Feb-2018 07:15:13] NOTICE: fpm is running, pid 11711 [08-Feb-2018 07:15:13] NOTICE: ready to handle connections [08-Feb-2018 07:15:14] NOTICE: Terminating ... [08-Feb-2018 07:15:14] NOTICE: exiting, bye-bye! [08-Feb-2018 07:15:14] NOTICE: fpm is running, pid 11775 [08-Feb-2018 07:15:14] NOTICE: ready to handle connections [08-Feb-2018 07:15:19] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 id le, and 37 total children

出现这种情况，说明我们的php-fpm进程数不够用了，需要增加进程数。

3、开启php-fpm的慢日志

php-fpm的慢日志类似于MySQL的慢日志查询，我们可以根据该日志中的信息来排查是那些进程太慢，方便我们优化php的性能

在php-fpm.conf的配置文件中加上：

1 2 3	；当一个请求该设置的超时时间后，就会将对应的PHP调用堆栈信息完整写入到慢日志中. 设置为 '0' 表示 'Off' request_slowlog_timeout = 1 slowlog = /var/log/php-fpm/php_slow.log

日志说明：

1 2 3 4	script_filename 是入口文件 curl_exec() 执行这个方法的时候超过执行时间 exfilter_curl_get() 说明调用curl_exec()方法是exfilter_curl_get()。每行冒号后面的数字是行号

示例文件：

1 2 3 4 5 6 7

[08-Feb-2018 11:45:03] [pool www] pid 457 script_filename = /home/wwwroot/novel2/public/index.php [0x00007f2030c13880] file_exists() /home/wwwroot/novel2/yaf_lib/vendor/composer/ClassLoader.php:384 [0x00007f2030c13710] findFileWithExtension() /home/wwwroot/novel2/yaf_lib/vendor/composer/ClassLoader.php:351 [0x00007f2030c13670] findFile() /home/wwwroot/novel2/yaf_lib/vendor/composer/ClassLoader.php:321 [0x00007f2030c135e0] loadClass() /home/wwwroot/novel2/yaf_lib/vendor/php-curl-class/php-curl-class/src/Curl/Cur l.php:1419

再结合php-fpm.log日志查看有没有超过5秒的请求

[08-Feb-2018 14:16:34] WARNING: [pool www] child 7524, script '/home/wwwroot/novel2/public/admin.php' (request: "GET /admin.php?/admin/fiction/adListData&type=1&draw=1&columns%5B0%5D%5Bdata%5D=app_name&columns%5B0%5D%5Bname%5D=&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=id&columns%5B1%5D%5Bname%5D=&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=keyword&columns%5B2%5D") executing too slow (6.350631 sec), logging

4、优化php-fpm的进程数

pm是用来控制php-fpm的工作进程数的，它有三种工作模式，一种是static，一次性产生，便固定不变；另一种是dynamic，在运行过程中随着需要动态变化；还有一种是ondemond模式，这种模式和dynamic相反，把内存放在第一位，工作模式简单，每个闲置进程，在持续限制了pm.process_idle_timeout秒之后就会被杀掉，好处便是服务器长时间没有请求，就只会有一个php-fpm主进程，缺点是高峰期或者pm.process_idle_timeout太短会造成服务器频繁创建进程。

那么到底是选择哪一个呢？

一般原则：动态适合小内存机器，灵活分配进程，省内存，静态适用于大内存机器，动态创建回收进程对服务器资源也是一种消耗，如果内存很大，8~20G，按照一个php-fpm进程20M计算，100个就已经是2G内存了，那么就可以开启static模式，如果内存很小，那么其他的进程也需要占用内存，所以设置dynamic。

这里我们使用pm = dynamic模式

1 2 3 4 5	pm.max_children = 60 pm.start_servers = 30 pm.min_spare_servers = 20 pm.max_spare_servers = 60 pm.max_requests = 5000

我们可以限制单个进程占用的内存大小，可以通过设置

1 2 3 4 5 6 7	#vim php.ini memory_limit = 48M 或 #vim php-fpm.conf php_admin_value[memory_limit] = 48M

如果机器因为php-fpm占用内存过高出现出现内存泄漏时，可以修改php-fpm的配置文件，加上：

1 2 3	emergency_restart_threshold = 30 emergency_restart_interval = 30s process_control_timeout 10

意思是：如果SIGSEGV或SIGBUS信号在emergency_restart_interval = 30s内出现了emergency_restart_threshold = 30次后，变回优雅重启下php-fpm。

SIGBUS (Bus error)意味着指针所对应的地址是有效的，但总线不能正常使用该指针，通常是未对齐的数据访问所致。

SIGSEGV (Segment fault)意味着指针所对应的地址是无效地址，没有物理内存对应该地址。

process_control_timeout 设置为10表示php-fpm给fastcgi子进程复用信号时的响应时间，即fastcgi响应php-fpm的超时时间为10秒

php请求处理过程如下：

Nginx与PHP的交互依赖于CGI接口，因为两者都实现了CGI接口，所以Nginx可以把收到的请求交给PHP，并从PHP获得相应的结果回传给客户端。最基本的CGI实现是每次请求都新建一个PHP进程，处理完成后关闭进程，这种方式会消耗很多的资源在进程的启动和关闭上，所以效率并不高。进而出现了FastCGI这种实现方式，也就是启动一个进程后让它处理多个请求再关闭，这种方式就是解决每次请求都打开和关闭进程的消耗的。但FastCGI有个缺点，就是因为一个进程只能同时处理一个请求，如果同时收到多个请求，它们只能排队等待FastCGI进程的处理。解决FastCGI只能同时处理一个请求的方式很简单，就是开启多个FastCGI进程。不过开启多个FastCGI进程的就存在对这些进程的管理问题，比如究竟要开多少个进程，怎么根据需要分配请求到这些进程上等等。而PHP-FPM就是这样一个管理FastCGI进程的管理程序。Nginx先将请求传递给PHP-FPM，再由PHP-FPM选择合适的FastCGI处理进程进行处理。

在PHP-FPM将请求传递给FastCGI处理进程的时候，就涉及到进程复用了。原则上，PHP-FPM会选择空闲的FastCGI进程去处理请求，在处理之前，PHP-FPM会发送进程复用信号给FastCGI进程，用来让FastCGI进程准备好接受请求并处理。但是，FastCGI进程并不总是能够处理请求，也就是不能够响应进程复用信号（比如说出现假死的情况），所以这个参数就表示了PHP-FPM留给FastCGI进程多久时间去响应进程复用信号，如果超时了，PHP-FPM会选择其他的方式（例如使用其他的FastCGI进程）去处理请求。

文章参考

https://github.com/Qihoo360/Atlas/releases

http://hackmysql.com/scripts/mysqlreport-3.5.tgz

http://www.zsythink.net/archives/tag/zabbix/

https://www.devilf.cc/?p=77

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2018-01-28，如有侵权请联系 cloudcommunity@tencent.com 删除

nginx