首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >简单shell脚本,实现多台liunx服务器自动巡检,避免每天重复的劳动工作!

简单shell脚本,实现多台liunx服务器自动巡检,避免每天重复的劳动工作!

作者头像
程序猿的栖息地
发布2022-04-29 15:47:33
发布2022-04-29 15:47:33
1.1K00
代码可运行
举报
运行总次数:0
代码可运行

运维服务一个项目二十多台(或者多台)服务器,每天要做服务器的性能巡检工作是查看服务器的CPU、内存、磁盘空间是否在正常值范围内。像这样每天或者每个固定时间段都做相同的重复工作,为了简化重复的工作,我写了基于liunx服务器的自动巡检脚本,在crontab中设定一个固定时间进行自动执行即可,以减少人工重复劳动的工作。

环境:

我的项目上主要服务器是LINUX和AIX两种服务器,总数在30台左右。现在的工作量是每周巡检两次,都是手动登录到每台服务器使用相应的命令查看服务器性能参数。

思路:

1、所有的服务器之间的网络都是在同一个局域网内,所有网络两两相通。

2、在其中选择一台性能相对较好或者是服务器运行压力较小的服务器,作为巡检服务器。

3、通过这一服务器来实现对其他服务器的巡检,然后把巡检结果记录到巡检服务器上。

4、每台服务器巡检结果都以时间和ip做命名用来区分,最后将所有巡检结果压缩打包。

5、每次维护人员只需要定时去取这个压缩包查看最后结果即可,免去了对每台服务器都需要登录和输入相同的命令进行查看。

日常LINUX巡检命令

代码语言:javascript
代码运行次数:0
运行
复制
hostname
uname -a
netstat -rn
ifconfig -a
cat /etc/sysconfig/hwconf
cat /proc/meminfo
cat /proc/cpuinfo
cat /proc/swaps
sfdisk -g
df –k
sfdisk –g
dmesg
more /var/log/boot.log
more /var/log/messages

linux服务器的日常巡检脚本

1、需巡检的服务器上定时执行:

代码语言:javascript
代码运行次数:0
运行
复制
#!/bin/sh
echo "------------ daily check begin -----------------" >>dc1.txt
#cd /home/wjlcn/monitor/check
cd /home/wjlcn/monitor/check/
date=`date +%c`
filename=`hostname`_check_`date +%Y%m%d`.txt
echo "-----------sar -ru 10 3----------------" >>dc1.txt
sar -ru 10 3 |sed -n '21,25p' >>dc1.txt
echo "------------top -d 1 -n 1 -------------" >>dc1.txt
/usr/bin/top  -b -d 1 -n 1 |sed -n '1,10p' |awk '{print $9,$12}' >top1.txt
sed '1,7d' top1.txt >>dc1.txt

echo "------------free -m ----------------" >>dc1.txt
free -m >>dc1.txt
echo "--------------df -h ---------------" >>dc1.txt
df -h >>dc1.txt
echo "----------  tripwire --check ----------">> dc1.txt
/usr/sbin/tripwire --check|sed -n '10p;18p;33,37p' >>dc1.txt
echo $date >>$filename
cat dc1.txt >>$filename
echo $date >>$filename
echo "--------------- the end ---------------" >>$filename
rm dc1.txt top1.txt

2、定时上传至ftp服务器

代码语言:javascript
代码运行次数:0
运行
复制
# 这样就只需在ftp服务器上巡检所有的服务器即可
#!/bin/sh
cd /home/itownet/monitor/check
LOFFILE=ftp.log
ftp -n >>$LOFFILE <<EOF
open IP
user user  password
binary
cd test/pcreport
put *.txt
bye
EOF

文件说明

该Shell脚本旨在针对大量Linux服务器的巡检提供一种相对自动化的解决方案。脚本组成有三部分:shellsh.sh、checksh.sh、file.txt;这三个文件需放在一个文件夹下以root权限执行,缺一不可。

脚本用法:

将要巡检的服务器的IP地址和对应的密码全部放入file.txt中保存,每行一个IP对应一个密码即可。然后用如下命令运行:

./ shellsh.sh file.txt192.168.182.143 123456

其中file.txt可以更换文件名,192.168.182.143为你想保存巡检日志的到哪个服务器的服务器IP,123456为该服务器的密码。

运行结果:

运行完后会在192.168.182.143服务器的/tmp目录下升成一个目录,即:GatherLogDirectory这个目录下即存放的是被巡检的服务器的巡检日志,这些日志以被巡检的服务器的IP命名,形如:192.168.182.146.log。在被巡检的服务器上会升成两个目录即:CheckScript、 LocalServerLogDirectory;其中CheckScript中是checksh.sh脚本,LocalServerLogDirectory中存放的是checksh.sh在该服务器上运行后升成的日志。

测试结果:

我只在虚拟机上的三台Linux系统上测试过,分别是Ubuntu、RedHat、Kali。运行正常,平均巡检一个服务器花费3分钟。

代码语言:javascript
代码运行次数:0
运行
复制
cat shellsh.sh
#!/bin/bash
login_info=$1
gather_server_ip=$2
gather_server_password=$3
grep_ip=`ifconfig | grep '\([[:digit:]]\{1,3\}\.\)\{3\}[[:digit:]]\{1,3\}' --color=auto -o | sed -e '2,5d'`

GatherPath="/tmp/GatherLogDirectory"
CheckScriptPath="/tmp/CheckScript"


if [ $# -ne 3 ]; then
    echo -e "Parameters if fault!\n"
    echo -e "Please using:$0 login_info gather_server_ip\n"
    echo -e "For example: $0 IpAndPassword.txt $grep_ip\n"
    exit;
fi

if [ ! -x "$GatherPath" ];then
    mkdir "$GatherPath"
    echo -e "The log's path is: $GatherPath"
fi



cat $login_info | while read line

do

    server_ip=`echo $line|awk '{print $1}'`
    server_password=`echo $line|awk '{print $2}'`
    login_server_command="ssh -o StrictHostKeyChecking=no root@$server_ip"
    scp_gather_server_checksh="scp checksh.sh root@$server_ip:$CheckScriptPath"

/usr/bin/expect<<EOF
        set timeout 20
        spawn $login_server_command
        expect {
                   "*yes/no" { send "yes\r"; exp_continue }
                   "*password:" { send "$server_password\r" }
               }
        expect "Permission denied, please try again." {exit}
        expect "#" { send "mkdir $CheckScriptPath\r"}
        expect eof
        exit

EOF
    
/usr/bin/expect<<EOF
        set timeout 20
        spawn $scp_gather_server_checksh
        expect {
                   "*yes/no" { send "yes\r"; exp_continue }
                   "*password:" { send "$server_password\r" }
               }
        expect "Permission denied, please try again." {exit}
        expect "Connection refused" {exit}
        expect "100%"
        expect eof
        exit

EOF

    
/usr/bin/expect<<EOF
        set timeout 60
        spawn $login_server_command
        expect {
                   "*yes/no" { send "yes\r"; exp_continue }
                   "*password:" { send "$server_password\r" }
               }
        expect "Permission denied, please try again." {exit}
        expect "#" { send "cd $CheckScriptPath;./checksh.sh $gather_server_ip $gather_server_password\r"}
        expect eof
        exit

EOF

done
代码语言:javascript
代码运行次数:0
运行
复制
cat checksh.sh
#!/bin/bash
########################################################################################
#Function:
#This script checks the system's information,disks's information,performance,etc...of the
#server
#
#Author:
#By Jack Wang
#
#Company:
#ShaanXi Great Wall Information Co.,Ltd.
########################################################################################

########################################################################################
#
#GatherServerIpAddress is the server's IP address that gather the checking log
#GatherServerPassword is the server's IP address that gather the checking log
#
########################################################################################
GatherServerIpAddress=$1
GatherServerPassword=$2

########################################################################################
#GetTheIpCommand is a command that you can get the IP address
########################################################################################
GetTheIpCommand=`ifconfig | grep '\([[:digit:]]\{1,3\}\.\)\{3\}[[:digit:]]\{1,3\}' --color=auto -o | sed -e '2,5d'`

########################################################################################
#LogName is a command that Your logs'name
########################################################################################
LogName=`ifconfig|grep '\([[:digit:]]\{1,3\}\.\)\{3\}[[:digit:]]\{1,3\}' --color=auto -o|sed -e '2,5d'``echo "-"``date +%Y%M%d`

########################################################################################
#
#GatherLogPath is a path that collecting log path
#LocalServerLogPath is local log path
#
########################################################################################
GatherServerLogPath="/tmp/GatherLogDirectory"
LocalServerLogPath="/tmp/LocalServerLogDirectory"


########################################################################################
#LinuxOsInformation is function that usege to collect OS's information
########################################################################################
LinuxOsInformation(){
Hostname=`hostname`
UnameA=`uname -a`
OsVersion=`cat /etc/issue | sed '2,4d'`
Uptime=`uptime|awk '{print $3}'|awk -F "," '{print $1}'`
ServerIp=`ifconfig|grep "inet"|sed '2,4d'|awk -F ":" '{print $2}'|awk '{print $1}'`
ServerNetMask=`ifconfig|grep "inet"|sed '2,4d'|awk -F ":" '{print $4}'|awk '{print $1}'`
ServerGateWay=`netstat -r|grep "default"|awk '{print $2}'`
SigleMemoryCapacity=`dmidecode|grep -P -A5 "Memory\s+Device"|grep "Size"|grep -v "Range"|grep '[0-9]'|awk -F ":" '{print $2}'|sed 's/^[ \t]*//g'`
MaximumMemoryCapacity=`dmidecode -t 16|grep "Maximum Capacity"|awk -F ":" '{print $2}'|sed 's/^[ \t]*//g'`
NumberOfMemorySlots=`dmidecode -t 16|grep "Number Of Devices"|awk -F ":" '{print $2}'|sed 's/^[ \t]*//g'`
MemoryTotal=`cat /proc/meminfo|grep "MemTotal"|awk '{printf("MemTotal:%1.0fGB\n",$2/1024/1024)}'|awk -F ":" '{print $2}'`
PhysicalMemoryNumber=`dmidecode|grep -A16 "Memory Device"|grep "Size:"|grep -v "No Module Installed"|grep -v "Range Size:"|wc -l`
ProductName=`dmidecode|grep -A10 "System Information"|grep "Product Name"|awk -F ":" '{print $2}'|sed 's/^[ \t]*//g'`
SystemCPUInfomation=`cat /proc/cpuinfo|grep "name"|cut  -d: -f2|awk '{print "*"$1,$2,$3,$4}'|uniq -c|sed 's/^[ \t]*//g'`

echo -e "Hostname|$Hostname\nUnamea|$UnameA\nOsVersion|$OsVersion\nUptime|$Uptime\nServerIp|$ServerIp\nServerNetMask|$ServerNetMask\nServerGateWay|$ServerGateWay\nSigleMemoryCapacity|$SigleMemoryCapacity\nMaximumMemoryCapacity|$MaximumMemoryCapacity\nNumberOfMemorySlots|$NumberOfMemorySlots\nMemoryTotal|$MemoryTotal\nPhysicalMemoryNumber|$PhysicalMemoryNumber\nProductName|$ProductName\nSystemCPUInformation|$SystemCPUInfomation"

}

PerformanceInfomation (){
CPUIdle=`top -d 2 -n 1 -b|grep C[Pp][Uu]|grep id|awk '{print $5}'|awk -F "%" '{print $1}'`
CPUloadAverage=`top -d 2 -n 1 -b|grep "load average:"|awk -F ":" '{print $5}'|sed 's/^[ \t]*//g'`
ProcessNumbers=`top -d 2 -n 1 -b|grep "Tasks"|awk -F "[: ,]" '{print $3}'`
Proce***unning=`top -d 2 -n 1 -b|grep "Tasks"|awk -F "[: ,]" '{print $8}'`
ProcessSleeping=`top -d 2 -n 1 -b|grep "Tasks"|awk -F "[: ,]" '{print $11}'`
ProcessStoping=`top -d 2 -n 1 -b|grep "Tasks"|awk -F "[: ,]" '{print $16}'`
ProcessZombie=`top -d 2 -n 1 -b|grep "Tasks"|awk -F "[: ,]" '{print $21}'`
UserSpaceCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $4}'`
SystemSpaceCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $8}'`
ChangePriorityCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $12}'`
WaitingCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $19}'`
HardwareIRQCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $23}'`
SoftwareIRQCPU=`top -d 2 -n 1 -b|grep 'C[Pp][Uu]'|head -1|awk -F "[: ,%]" '{print $27}'`
MemUsed=`top -d 2 -n 1 -b|grep "Mem"|awk -F "[: ,]" '{print $11}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
MemFreeP=`top -d 2 -n 1 -b|grep "Mem"|awk -F "[: ,]" '{print $16}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
MemBuffersP=` top -d 2 -n 1 -b|grep "Mem"|awk -F "[: ,]" '{print $22}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
CacheCachedP=`top -d 2 -n 1 -b|grep "Swap"|awk -F "[: ,]" '{print $24}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
CacheTotal=`top -d 2 -n 1 -b|grep "Swap"|awk -F "[: ,]" '{print $4}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
CacheUsed=`top -d 2 -n 1 -b|grep "Swap"|awk -F "[: ,]" '{print $14}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`
CacheFree=`top -d 2 -n 1 -b|grep "Swap"|awk -F "[: ,]" '{print $18}'|tr -d "a-zA-Z"|awk '{printf("%dM\n",$1/1024)}'`

echo -e "CPUIdle|$CPUIdle\nCPUloadAverage|$CPUloadAverage\nProcessNumbers|$ProcessNumbers\nProce***unning|$Proce***unning\nProcessSleeping|$ProcessSleeping\nProcessStoping|$ProcessStoping\nProcessZombie|$ProcessZombie\nUserSpaceCPU|$UserSpaceCPU\nSystemSpaceCPU|$SystemSpaceCPU\nChangePriorityCPU|$ChangePriorityCPU\nWaitingCPU|$WaitingCPU\nHardwareIRQCPU|$HardwareIRQCPU\nSoftwareIRQCPU|$SoftwareIRQCPU\nMemUsed|$MemUsed\nMemFreeP|$MemFreeP\nMemBuffersP|$MemBuffersP\nCacheCachedP|$CacheCachedP\nCacheTotal|$CacheTotal\nCacheUsed|$CacheUsed\nCacheFree|$CacheFree\n"
}

OprateSystemSec () {
echo '======================UserLogin======================'
w

echo '======================FileUsed======================='
df -ah

echo '======================dmesgError====================='
dmesg | grep error

echo '======================dmesgFail======================'
dmesg | grep Fail

echo '======================BootLog========================'
more /var/log/boot.log | grep -V "OK" | sed '1,6d'

echo '======================route -n======================='
route -n
echo '======================iptables -L===================='
iptables -L
echo '======================netstat -lntp=================='
netstat -lntp
echo '======================netstat -antp=================='
netstat -antp
echo '======================BootLog========================'
netstat -s
echo '======================netstat -s====================='
last
echo '======================du -sh /etc/==================='
du -sh /etc/
echo '======================du -sh /boot/=================='
du -sh /boot/
echo '======================du -sh /dev/==================='
du -sh /dev/
echo '======================df -h=========================='
df -h
echo '======================mount | column -t=============='
mount | column -t

}


TopAndVmstat(){
top -d 2 -n 1 -b
vmstat 1 10
}

CheckGatherLog(){

if [ -f "$LocalServerLogPath/$GetTheIpCommand.log" ];then
       rm -rf $LocalServerLogPath/$GetTheIpCommand.log
fi

if [ ! -x "$LocalServerLogPath" ];then
    mkdir "$LocalServerLogPath"
fi

if [ ! -f "$LocalServerLogPath/$GetTheIpCommand.log" ];then
    touch $LocalServerLogPath/$GetTheIpCommand.log
    LinuxOsInformation>>$LocalServerLogPath/$GetTheIpCommand.log
    PerformanceInfomation>>$LocalServerLogPath/$GetTheIpCommand.log
    OprateSystemSec>>$LocalServerLogPath/$GetTheIpCommand.log
    TopAndVmstat>>$LocalServerLogPath/$GetTheIpCommand.log
fi
}

CheckGatherLog

SCP_LOG_TO_GATHER_SERVER="scp $LocalServerLogPath/$GetTheIpCommand.log root@$GatherServerIpAddress:$GatherServerLogPath"

/usr/bin/expect<<EOF
        set timeout 50
        spawn $SCP_LOG_TO_GATHER_SERVER
        expect {
                "*yes/no)?" 
                 { 
                     send "yes\n"
                     "*password:*" {send "GatherServerPassword\n"}
                 } 
                "*password:"         
                 {
                     send "$GatherServerPassword\n"
                 }
               }
        expect "*password:"  { send "$GatherServerPassword\n" }
        expect "100%"
        expect eof

EOF
代码语言:javascript
代码运行次数:0
运行
复制
# file.txt内容形式
cat file.txt
192.168.182.143  123456
192.168.182.129  123456
192.168.182.146  123456

注:192.168.182.143是被巡检的服务器ip,123456是被巡检的服务器密码。

代码语言:javascript
代码运行次数:0
运行
复制
cat check_linux.sh
#!/bin/bash

check_process(){
tolprocess=`ps auxf|grep DisplayMa[nager]|wc -l`

#if [ "$tolprocess" -lt "1" ];then
if [ "$tolprocess" -ge "1" ];then
    echo 'process ok'
else
    echo 'fail'
fi
}


check_log(){
if [ -e /etc/syslog-ng/syslog-ng.conf ];then
    conlog=`cat '/etc/syslog-ng/syslog-ng.conf'|grep "10.70.72.253"|wc -l`
    if [ "$conlog" -ge "1" ];then
        echo 'syslog-ng ok'
    fi
elif [ -e /etc/syslog.conf ];then
    conlog=`cat '/etc/syslog.conf'|grep "10.70.72.253"|wc -l`
    if [ "$conlog" -ge "1" ];then
           echo 'syslog ok'
    fi
else
    echo 'log not find or error'
fi
}


check_cpuidle(){
mincpu=`sar -u 2 10|grep all|awk '{print $NF}'|sort -nr|tail -1`

if [ $(echo "${mincpu} < 20" | bc) = 1 ];then
#if [ "$mincpu" -le "20" ];then
    echo 'cpu idle is less than 20% ,please check'
else
    echo 'cpu idle is more than 20%, it is ok '
fi

}


check_mem(){
vmstat 2 10 
}


check_disk(){
chkdsk=`fdisk -l|egrep 'failed|unsynced|unavailable'|wc -l`
if [ "$chkdsk" -ge "1" ];then
    echo 'fdisk check ok '
else
    echo 'fdisk check find error,please check your disk '
fi
}


check_io(){
util=`sar -d 2 10|egrep -v 'x86|^$|await'|awk '{print $NF}'|sort -nr|tail -1`
await=`sar -d 2 10|egrep -v 'x86|^$|await'|awk '{print $(NF-2)}'|sort -nr|tail -1`

if [ $(echo "${util} < 80" | bc) = 1 ] && [ $(echo "${await} < 100" | bc) = 1 ] ;then
    echo 'disk io check is fine'
else
    echo 'disk io use too high '
fi

}


check_swap(){

tolswap=`cat /proc/meminfo|grep SwapTotal|awk '{print $2}'`
#awk '/SwapTotal/{total=$2}/SwapFree/{free=$2}END{print (total-free)/1024}' /proc/meminfo 
useswap=`awk '/SwapTotal/{total=$2}/SwapFree/{free=$2}END{print (total-free)}' /proc/meminfo `
util=`awk 'BEGIN{printf "%.1f\n",'$useswap'/'$tolswap'}'`


if [ $(echo "${util} < 0.3" | bc) = 1 ] || [ $(echo "${useswap} < 1024" | bc) = 1 ] ;then
    echo 'swap use is ok '
else
    echo "useswap: $useswap kb, swap util is $util"
fi

}


check_dmesg(){
chkdm=`dmesg |egrep 'scsi reset|file system full'|wc -l`
if [ "$chkdm" -ge "1" ];then
    echo 'dmesg test ok '
else
    echo 'dmesg check find error '
fi
}

check_boot(){
chkdm=`cat /var/log/boot.msg|egrep 'scsi reset|file system full'|wc -l`
if [ "$chkdm" -ge "1" ];then
    echo 'boot check fine '
else
    echo 'boot check find error '
fi
}

check_inode(){
maxinode=`df -i|awk '{print $5}'|egrep -v 'IUse|-' |sed 's/%//g'|sort -nr|head -1`
if [ $(echo "${maxinode} < 80" | bc) = 1 ];then
    echo 'inode check ok '
else
    echo 'inode used more than 80% '
fi
}

check_df(){
dfuse=`df -HT|awk '{print $6}'|grep -v Use|sed 's/%//g'|sort -nr|head -1`
if [ $(echo "${dfuse} < 80" | bc) = 1 ];then
    echo 'disk used is less than 80% ,it is ok !'
elif [ $(echo "${dfuse} > 80" | bc) = 1 ] && [ $(echo "${dfuse} < 90" | bc) = 1 ];then
    echo 'warning , disk used more than 80% and less than 90% '
else
    echo ' Critical, disk used more than 90% '
fi
}


echo '################### check process ###################'
check_process
echo '################### check syslog ####################'
check_log
echo '################### check cpuidle ###################'
check_cpuidle
echo '################### echo memory stat ################'
check_mem
echo '################### check fdisk #####################'
check_disk
echo '################### check io used ###################'
check_io
echo '################### check swap used #################'
check_swap
echo '################### check dmesg #####################'
check_dmesg
echo '################### check inode #####################'
check_inode
echo '################### check disk used #################'
check_df
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2020-12-04,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 程序猿的栖息地 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 日常LINUX巡检命令
  • linux服务器的日常巡检脚本
  • ./ shellsh.sh file.txt192.168.182.143 123456
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档