# zookeeper学习系列：四、Paxos算法和zookeeper的关系

Paxos 这个算法是Leslie Lamport在1990年提出的一种基于消息传递的一致性算法.Paxos 算法解决的问题是一个分布式系统如何就某个值（决议）达成一致。

part-time parliament Paxos Made Simple里这样描述Paxos算法执行过程：

1. prepare 阶段：
1. proposer 选择一个提案编号 n 并将 prepare 请求发送给 acceptors 中的一个多数派；
2. acceptor 收到 prepare 消息后，如果提案的编号大于它已经回复的所有 prepare 消息，则 acceptor 将自己上次接受的提案回复给 proposer，并承诺不再回复小于 n 的提案；
2. 批准阶段：
1. 当一个 proposer 收到了多数 acceptors 对 prepare 的回复后，就进入批准阶段。它要向回复 prepare 请求的 acceptors 发送 accept 请求，包括编号 n 和根据 P2c 决定的 value（如果根据 P2c 没有已经接受的 value，那么它可以自由决定 value）。
2. 在不违背自己向其他 proposer 的承诺的前提下，acceptor 收到 accept 请求后即接受这个请求。

3. Learn阶段:

• 当各个Acceptor达到一致之后，需要将达到一致的结果通知给所有的Learner.

二、资料证实

There is a very common misunderstanding that the leader election algorithm in zookeeper is paxos or fast paxos. The leader election algorithm is not paxos or fast paxos, please consider the following facts:

1. There is no the concept of proposal number in the leader election in zookeeper. the proposal number is a key concept to paxos. Some one think the epoch is the proposal number, but different followers may produce proposal with the same epoch which is not allowed in paxos.
2. Fast paxos requires at least 3t + 1 acceptors, where t is the number of servers which are allowed to fail [3]. This is conflict with the fact that a zookeeper cluster with 3 servers works well even if one server failed.
3. The leader election algorithm must make sure P1. However paxos does provide such guarantee.
4. In paxos, a leader is also required to achieve progress. There are some similarities between the leader in paxos and the leader in zookeeper. Even if more than one servers believe they are the leader, the consistency is preserved both in zookeeper and in paxos. this is very clearly discussed in [1] and [2].

1）邮件列表

Our protocol instead, has only two phases, just like a two-phase
commit protocol. Of course, for Paxos, we can ignore the first phase in runs in
which we have a single proposer as we can run phase 1 for multiple instances at
a time, which is what Ben called previously Multi-Paxos, I believe. The trick
with skipping phase 1 is to deal with leader switching.

2）出书的访谈

We made a few interesting observations about Paxos when contrasting it to Zab, like problems you could run into if you just implemented Paxos alone. Not that Paxos is broken or anything, just that in our setting, there were some properties it was not giving us. Some people still like to map Zab to Paxos, and they are not completely off, but the way we see it, Zab matches a service like ZooKeeper well.

zk的分布式一致性算法有了个名称叫Zab

3）论文

We use an algorithm that shares some of the character- istics of Paxos, but that combines transaction logging needed for consensus with write-ahead logging needed for data tree recovery to enable an efficient implementa- tion.

zk的选主策略：

there can be at most one leader (proposer) at any time, and we guarantee this by making sure
that a quorum of replicas recognize the leader as a leader by committing to an
epoch change. This change in epoch also allows us to get unique zxids since the
epoch forms part of the zxid.

1. The leader sends a PROPOSAL message, p, to all followers.
2. Upon receiving p, a follower responds to the leader with an ACK, informing the leader that it has accepted the proposal.
3. Uponreceivingacknowledgmentsfromaquorum(thequorumincludestheleader itself), the leader sends a message informing the followers to COMMIT it.

[1] Reed, B., & Junqueira, F. P. (2008). A simple totally ordered broadcast protocol. Second Workshop on Large-Scale Distributed Systems and Middleware (LADIS 2008). Yorktown Heights, NY: ACM. ISBN: 978-1-60558-296-2. [2] Lamport, L. Paxos made simple. ACM SIGACT News 32, 4 (Dec. 2001), 1825. [3] F. Junqueira, Y. Mao, and K. Marzullo. Classic paxos vs. fast paxos: caveat emptor. In Proceedings of the 3rd USENIX/IEEE/IFIP Workshop on Hot Topics in System Dependability (HotDep.07). Citeseer, 2007.

[4]O'Reilly.ZooKeeper.Distributed process coordination.2013

[5] http://agapple.iteye.com/blog/1184023  zookeeper项目使用几点小结

0 条评论

• ### golang mgo的mongo连接池设置：必须手动加上maxPoolSize

本司礼物系统使用了golang的 mongo库 mgo，中间踩了一些坑，总结下避免大家再踩坑 golang的mgo库说明里是说明了开启连接复用的，但观察实验发现...

• ### 空指针错误导致tomcat报404错误

项目代码的异常类型为500 400 没有404错误 线上却偶尔报404错误，导致成功率低于99% 追查发现是由于一个空指针错误，未被捕获抛出指定项目异常 mar...

• ### 没用过分布式锁？那你该好好看看这篇文章。

原文：https://www.cnblogs.com/JJJ1990/p/10496850.html

• ### 关于分布式锁原理的一些学习与思考-redis分布式锁，zookeeper分布式锁

首先分布式锁和我们平常讲到的锁原理基本一样，目的就是确保，在多个线程并发时，只有一个线程在同一刻操作这个业务或者说方法、变量。

• ### 史上最通俗分布式锁解读

首先，分布式锁和我们平常讲到的锁原理基本一样，目的就是确保在多个线程并发时，只有一个线程在同一刻操作这个业务或者说方法、变量。

• ### 基本状态检测 转

netstat -n | awk '/^tcp/ {++S[\$NF]} END {for(a in S) print a, S[a]}'

• ### 色彩细化的迭代次数（CS DM）

颜色细化过程及其对更高维的推广（Weisfeiler-Leman算法）是解决图形同构问题方法的核心子程序。“颜色细化”以迭代方式计算其输入图的顶点的着色。

• ### 使用高防后，服务器还是会受到攻击这是为什么？

近期听墨者安全的客服人员说有些受了DDOS攻击的用户反映，曾使用了某些公司的高防产品，服务器还是会受到攻击，说DDOS攻击防御防不住，为此对DD...

• ### ZooKeeper 安装与启动

要在你的计算机上安装ZooKeeper框架，请访问该链接并下载最新版本的ZooKeeper。 到目前为止，最新稳定版本的ZooKeeper是3.4.12(Zoo...