Consistent hashing

What is libconhash?

libconhash is a consistent hashing library which can be compiled both on Windows and Linux platforms, with the following features:

  1. High performance and easy to use, libconhash uses a red-black tree to manage all nodes to achieve high performance.
  2. By default, it uses the MD5 algorithm, but it also supports user-defined hash functions.
  3. Easy to scale according to the node's processing capacity.

Consistent hashing

Why you need consistent hashing?

Now we will consider the common way to do load balance. The machine number chosen to cache object o will be:

hash(o) mod n

Here, n is the total number of cache machines. While this works well until you add or remove cache machines:

  1. When you add a cache machine, then object o will be cached into the machine:
hash(o) mod (n+1)
  1. When you remove a cache machine, then object o will be cached into the machine: hash(o) mod (n-1)

So you can see that almost all objects will hashed into a new location. This will be a disaster since the originating content servers are swamped with requests from the cache machines. And this is why you need consistent hashing.

Consistent hashing can guarantee that when a cache machine is removed, only the objects cached in it will be rehashed; when a new cache machine is added, only a fairly few objects will be rehashed.

Now we will go into consistent hashing step by step.

Hash space

Commonly, a hash function will map a value into a 32-bit key, 0~2^<sup>32</sup>-1. Now imagine mapping the range into a circle, then the key will be wrapped, and 0 will be followed by 2^32-1, as illustrated in figure 1.

Figure 1

Map object into hash space

Now consider four objects: object1~object4. We use a hash function to get their key values and map them into the circle, as illustrated in figure 2.

Figure 2

hash(object1) = key1;
.....
hash(object4) = key4;

Map the cache into hash space

The basic idea of consistent hashing is to map the cache and objects into the same hash space using the same hash function.

Now consider we have three caches, A, B and C, and then the mapping result will look like in figure 3.

hash(cache A) = key A;
....
hash(cache C) = key C;

Figure 3

Map objects into cache

Now all the caches and objects are hashed into the same space, so we can determine how to map objects into caches. Take object obj for example, just start from where obj is and head clockwise on the ring until you find a server. If that server is down, you go to the next one, and so forth. See figure 3 above.

According to the method, object1 will be cached into cache A; object2 and object3 will be cached into cache C, and object4 will be cached into cache B.

Add or remove cache

Now consider the two scenarios, a cache is down and removed; and a new cache is added.

If cache B is removed, then only the objects that cached in B will be rehashed and moved to C; in the example, see object4 illustrated in figure 4.

Figure 4

If a new cache D is added, and D is hashed between object2 and object3 in the ring, then only the objects that are between D and B will be rehashed; in the example, see object2, illustrated in figure 5.

Figure 5

Virtual nodes

It is possible to have a very non-uniform distribution of objects between caches if you don't deploy enough caches. The solution is to introduce the idea of "virtual nodes".

Virtual nodes are replicas of cache points in the circle, each real cache corresponds to several virtual nodes in the circle; whenever we add a cache, actually, we create a number of virtual nodes in the circle for it; and when a cache is removed, we remove all its virtual nodes from the circle.

Consider the above example. There are two caches A and C in the system, and now we introduce virtual nodes, and the replica is 2, then three will be 4 virtual nodes. Cache A1 and cache A2 represent cache A; cache C1 and cache C2 represent cache C, illustrated as in figure 6.

Figure 6

Then, the map from object to the virtual node will be:

objec1->cache A2; objec2->cache A1; objec3->cache C1; objec4->cache C2

When you get the virtual node, you get the cache, as in the above figure.

So object1 and object2 are cached into cache A, and object3 and object4 are cached into cache. The result is more balanced now.

So now you know what consistent hashing is.

本文分享自微信公众号 - 只喝牛奶的杀手(killerhub),作者:sparkliang

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2018-12-31

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 关于代码重构

    车子脏了就得洗,坏了就得修,报废了就得换。程序也一样,不合需求就得改,难于跟上业务的变更就得重构,实在没法改了就得重写。

    只喝牛奶的杀手
  • 关于全文检索

    我们都知道关于全文检索大多公司的选型都是ElasticSearch,为什么是它?可能有的人会回复Es利用倒排索引适用于全文检索,倒排索引怎么存的?倒排索引为什么...

    只喝牛奶的杀手
  • 绞杀者模式

    通过将特定的功能片断逐渐取代为新的应用程序和服务,逐步迁移旧系统。 随着旧系统的功能被替换,新系统最终将取代旧系统的所有功能,抑制旧系统并使其停用。

    只喝牛奶的杀手
  • TiB、TB的区别

    使用谷歌的单位转换,忽然糊涂了,对着Terabyte和Tebibyte, 知道应该是2进制和十进制的区别,单愣是不知道哪个应该是二进制。

    望天
  • kubernetes controller 解析

    controller内部有个内存cache,cache 一般和lister/ indexer 一起配合使用, 用一个 Indexer interface进行的包...

    王磊-AI基础
  • Ehcache timeToIdleSeconds和 timeToLiveSeconds区别

    一直不理解ehcache中 timeToIdleSeconds和 timeToLiveSeconds 这两个配置的区别,特意找到官方文档看了一下,基本理解了它的...

    小柒2012
  • 分享一本关于稀疏信号处理的书给大家,也是我曾读过的教材[附下载链接]

    Sparse Signal Processing 作者/authors M Azghani, F Marvasti 摘要/abstract Convention...

    互联网金融打杂
  • 【论文推荐】最新6篇卷积神经网络相关论文—多任务学习、SAR和光学图像、动态加权排列、去雾新方法、点CNN、肿瘤生长预测

    【导读】专知内容组整理了最近六篇卷积神经网络(CNN)相关文章,为大家进行介绍,欢迎查看! 1. NDDR-CNN: Layer-wise Feature Fu...

    WZEARW
  • Unity项目嵌入现有iOS项目的方法

    傅_hc
  • 在拥挤场景中基于多视点几何的对多人三维姿态估计

    中文摘要:外极约束是目前多机三维人体姿态估计方法中特征匹配和深度估计的核心问题。尽管该公式在稀疏人群场景中的表现令人满意,但在密集人群场景中,其有效性经常受到挑...

    用户7454122

扫码关注云+社区

领取腾讯云代金券