MongoDB复制集通过oplog进行同步,但是4.4之前oplog只能通过参数设置固定大小,无法像mysql binlog那样指定保留天数。
这种做法有很多问题:
MongoDB官方应该是意识到这个问题,4.4引入oplogMinRetentionHours
New in version 4.4: Specifies the minimum number of hours to preserve an oplog entry,where the decimal values represent the fractions of an hour. For example, avalue of 1.5 represents one hour and thirty minutes.
对于pre-4.4版本,如何归档线上的oplog并保证没有丢失,成为1个问题。
我写了一个小脚本,每隔5分钟执行1次,用于备份MongoDB实例的oplog,脚本逻辑如下:
1 检测是否存在last_oplog_ts文件
2 当前实例oplog的first event时间戳
执行命令,获取当前实例oplog的first event时间戳:
shard1:SECONDARY> rs.printReplicationInfo()
configured oplog size: 10240MB
log length start to end: 1248136secs(346.7hrs)
oplog first event time: Wed Jul 14 2021 23:19:25 GMT+0800 (CST)
oplog last event time: Thu Jul 29 2021 10:01:41 GMT+0800 (CST)
now: Thu Jul 29 2021 10:01:41GMT+0800 (CST)
[root@ssg3-db-mongodb-02 ~]# mongo -u*** -p"***"--authenticationDatabase admin --eval "rs.printReplicationInfo()" |grep 'oplog first event' | cut -d' ' -f7,8,9,10,11,12
Jul 14 2021 23:19:25 GMT+0800 (CST)
将当前获取的first event time和第一步获取的last event time进行差值比较,如果差值小于阈值(默认1200s,可调整),则执行如下步骤:
获取rs.printReplicationInfo()返回的oplog last event time,将其更新到last_oplog_ts
执行第3步。
3 执行oplog备份
如果第一次mongodump执行失败,间隔60s后再次尝试,如果还失败则退出
对备份出的oplog目录进行打包压缩。
4 删除7天之前的oplog压缩文件
注:上述的rs.printReplicationInfo()命令可以分别用db.getReplicationInfo().tFirst和db.getReplicationInfo().tLast替换
整个脚本的逻辑比较简单,借助1个磁盘临时文件存储oplog last event time。
每次执行脚本都会用当前的oplog first event time和其进行比较,这个差值t1可以粗略看做oplog的"可用容量":即按照当前oplog生成速度,上次oplog备份后生成的oplog,将在t1秒后会被覆盖。
如果t1小于阈值,则更新临时文件并执行1次oplog备份。
oplog本身是幂等的,每个备份文件的时间段即便有重叠,在重放的时候也能保证数据完整。
以下是脚本全部内容,可根据实际情况进行修改。
#!/bin/bash
source ~/.bash_profile
BASE_DIR=/data/backup
MONGO=/usr/local/mongodb/bin/mongo
MONGO_DUMP=/usr/local/mongodb/bin/mongodump
MONGO_PORT=27017
MONGO_HOST=127.0.0.1
LOG=oplog_backup.log
OPLOG_TS_FILE=last_oplog_ts
CUR_TIME=`date`
DB_IP=`/sbin/ip a | grep eth0 | grep inet |awk '{print $2}' | cut -d'/' -f1 | head -1`
BACKUP_TIME=`date +%Y%m%d%H%M`
DB_BACKUP_NAME=oplog_${DB_IP}_${BACKUP_TIME}.tar.gz
BACKUP_TMP_DIR=${BACKUP_TIME}
THRESHOLD=1200
KEEP_DAY=7
do_exit(){
if[ $? -eq 0 ]; then
exit 0
else
#根据各自环境的实际监控逻辑自行实现
exit 1
fi
}
oplog_backup(){
$MONGO_DUMP -h $MONGO_HOST:$MONGO_PORT -u backup -p *******--authenticationDatabase "admin" -d local -c oplog.rs --gzip -o $BACKUP_TMP_DIR
if[ $? -ne 0 ]; then
rm -rf $BACKUP_TMP_DIR
sleep 60
$MONGO_DUMP -h $MONGO_HOST:$MONGO_PORT -u backup -p *******--authenticationDatabase "admin" -d local -c oplog.rs --gzip -o $BACKUP_TMP_DIR
do_exit
fi
tar-zcvf $DB_BACKUP_NAME $BACKUP_TMP_DIR
do_exit
rm-rf $BACKUP_TMP_DIR
}
if [ ! -d "$BASE_DIR" ]; then
mkdir "$BASE_DIR"
fi
cd $BASE_DIR
#last_oplog_ts not exists, create it andtake an oplog backup
if [ ! -f "$OPLOG_TS_FILE" ];then
$MONGO -u backup -p ******* --authenticationDatabase admin --port"$MONGO_PORT" --eval "rs.printReplicationInfo()" | grep'oplog last event' | cut -d' ' -f7,8,9,10,11,12 > $OPLOG_TS_FILE
do_exit
echo "`date ` $OPLOG_TS_FILE not exists, initiate it and take anoplog backup" >> $LOG
oplog_backup
echo "`date` oplog backup succeed" >> $LOG
exit 0
fi
LAST_EVENT=$(cat $OPLOG_TS_FILE)
#last_oplog_ts is empty and has nolast_event, the oplog backup fails and exits
if [ ! -n "$LAST_EVENT" ];then
echo 1 > /data/backup/dumperr.log
echo "`date` LAST_EVENT in $OPLOG_TS_FILE is NULL and oplog backupfails" >> $LOG
exit 1
fi
#caculate the diff in seconds, and take anoplog backup when it's lower than the threshold
FIRST_EVENT=`$MONGO -u backup -p *******--authenticationDatabase admin --port "$MONGO_PORT" --eval"rs.printReplicationInfo()" | grep 'oplog first event' | cut -d' '-f7,8,9,10,11,12`
E1=$(date '+%s' -d"${LAST_EVENT}")
E2=$(date '+%s' -d"${FIRST_EVENT}")
DIFF=`expr $E1 - $E2`
echo "the time diff now is ${DIFF}sand the threshold is ${THRESHOLD}s"
if [ $DIFF -lt $THRESHOLD ]; then
echo "`date` the diff is lower than ${THRESHOLD}, take an oplogbackup" >> $LOG
$MONGO -u backup -p ******* --authenticationDatabase admin --port"$MONGO_PORT" --eval "rs.printReplicationInfo()" | grep'oplog last event' | cut -d' ' -f7,8,9,10,11,12 > $OPLOG_TS_FILE
do_exit
oplog_backup
echo "`date` oplog backup succeed" >> $LOG
fi
find $BASE_DIR -name"oplog_${DB_IP}*" -mtime +${KEEP_DAY} | xargs rm -rf
作者:任坤
现居珠海,先后担任专职 Oracle 和 MySQL DBA,现在主要负责 MySQL、mongoDB 和 Redis 维护工作。
本文分享自 Mongoing中文社区 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!