NoSQL Distilled 第五章 Consistency




Chapter 5. Consistency 第五章 一致性

One of the biggest changes from a centralized relational database to a cluster- oriented NoSQL database is in how you think about consistency. Relational databases try to exhibit strong consistency by avoiding all the various inconsistencies that we’ll shortly be discussing. Once you start looking at the NoSQL world, phrases such as “ CAP theorem” and “ eventual consistency” appear, and as soon as you start building something you have to think about what sort of consistency you need for your system.

从关系数据库过渡到NoSQL数据库的一个最大改变就是你对一致性的思考方式。关系数据库主要是通过“强一致性”来避免各种不一致的问题,这个我们很快就会说到。一旦你进入NoSQL的世界,你就会接触到“CAP 定理”和“最终一致性”这些术语,一旦你开始构建,你就要考虑你的系统需要哪种一致性,什么样级别的一致性。

Consistency comes in various forms, and that one word covers a myriad of ways errors can creep into your life. So we’re going to begin by talking about the various shapes consistency can take. After that we’ll discuss why you may want to relax consistency (and its big sister, durability)..


5.1. Update Consistency 更新一致性

We’ll begin by considering updating a telephone number. Coincidentally, Martin and Pramod are looking at the company website and notice that the phone number is out of date. Implausibly, they both have update access, so they both go in at the same time to update the number. To make the example interesting, we’ll assume they update it slightly differently, because each uses a slightly different format. This issue is called a write-write conflict: two people updating the same data item at the same time.

我们现在就来举一个例子,一个修改电话号码的例子。比如,Martin和Pramod两个人都看到公司的网站上的联系电话不是最新的。他们两个又都有修改权限。于是两个就登录进去在同一时间去修改电话号码。为了让这个例子更加的说明问题,我们假设他们更新后的电话号码格式还不太一样。我们把这种情况叫做“写写冲突”( write-write conflict),就是这种两个人同一时刻去更新同一条数据的情况。

When the writes reach the server, the server will serialize them—decide to apply one, then the other. Let’s assume it uses alphabetical order and picks Martin’s update first, then Pramod’s. Without any concurrency control, Martin’s update would be applied and immediately overwritten by Pramod’s. In this case Martin’s is a lost update. Here the lost update is not a big problem, but often it is. We see this as a failure of consistency because Pramod’s update was based on the state before Martin’s update, yet was applied after it.



Approaches for maintaining consistency in the face of concurrency are often described as pessimistic or optimistic. A pessimistic approach works by preventing conflicts from occurring; an optimistic approach lets conflicts occur, but detects them and takes action to sort them out. For update conflicts, the most common pessimistic approach is to have write locks, so that in order to change a value you need to acquire a lock, and the system ensures that only one client can get a lock at a time. So Martin and Pramod would both attempt to acquire the write lock, but only Martin (the first one) would succeed. Pramod would then see the result of Martin’s write before deciding whether to make his own update.


A common optimistic approach is a conditional update where any client that does an update tests the value just before updating it to see if it’ s changed since his last read. In this case, Martin’s update would succeed but Pramod’s would fail. The error would let Pramod know that he should look at the value again and decide whether to attempt a further update.



Both the pessimistic and optimistic approaches that we’ve just described rely on a consistent serialization of the updates. With a single server, this is obvious— it has to choose one, then the other. But if there’ s more than one server, such as with peer-to-peer replication, then two nodes might apply the updates in a different order, resulting in a different value for the telephone number on each peer. Often, when people talk about concurrency in distributed systems, they talk about sequential consistency—ensuring that all nodes apply operations in the same order.


There is another optimistic way to handle a write-write conflict—save both updates and record that they are in conflict. This approach is familiar to many programmers from version control systems, particularly distributed version control systems that by their nature will often have conflicting commits. The next step again follows from version control: You have to merge the two updates somehow. Maybe you show both values to the user and ask them to sort it out—this is what happens if you update the same contact on your phone and your computer. Alternatively, the computer may be able to perform the merge itself; if it was a phone formatting issue, it may be able to realize that and apply the new number with the standard format. Any automated merge of write-write conflicts is highly domain-specific and needs to be programmed for each particular case.


Often, when people first encounter these issues, their reaction is to prefer pessimistic concurrency because they are determined to avoid conflicts. While in some cases this is the right answer, there is always a tradeoff. Concurrent programming involves a fundamental tradeoff between safety (avoiding errors such as update conflicts) and liveness (responding quickly to clients).. Pessimistic approaches often severely degrade the responsiveness of a system to the degree that it becomes unfit for its purpose. This problem is made worse by the danger of errors—pessimistic concurrency often leads to deadlocks, which are hard to prevent and debug.


Replication makes it much more likely to run into write-write conflicts. If different nodes have different copies of some data which can be independently updated, then you’ll get conflicts unless you take specific measures to avoid them. Using a single node as the target for all writes for some data makes it much easier to maintain update consistency. Of the distribution models we discussed earlier, all but peer-to-peer replication do this.


