ARTS打卡第十二周

公众号_松华说

发布于 2019-07-16 11:42:08

6140

发布于 2019-07-16 11:42:08

文章被收录于专栏：松华说

ARTS的初衷

Algorithm: 主要是为了编程训练和学习。

Review：主要是为了学习英文

Tip：主要是为了总结和归纳在是常工作中所遇到的知识点。学习至少一个技术技巧。在工作中遇到的问题，踩过的坑，学习的点滴知识。

Share：主要是为了建立影响力，能够输出价值观。分享一篇有观点和思考的技术文章

https://www.zhihu.com/question/301150832

一、Algorithm

二、Review

http://www.tom-e-white.com/2007/11/consistent-hashing.html 一致性哈希算法

https://medium.com/@dgryski/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8 一致性哈希算法权衡

一致性哈希是用来解决缓存节点删除增加或者微服务架构中的粘性负载均衡后端节点删除增加时，避免大部分节点缓存失效的一种算法。原理是将对象和节点都进行相同hash算法( MurmurHash, MetroHash or SipHash1–3,SHA-1 or MD5)处理后映射到一个圆环中(This should be familiar to every Java programmer – the hashCode method on Object returns an int, which lies in the range -231 to 231-1. Imagine mapping this range into a circle so the values wrap around)，顺时针查找对象映射位置的下一个节点，则是命中的节点，如果没有找到则返回圆环的第一个节点(To find which cache an object goes in, we move clockwise round the circle until we find a cache point)，这样当新增和删除节点时并不会改变原来所有的命中规则

1和4会命中A、2会命中B、3会命中C

当C节点crash和新增D节点时，只会影响3和4的object缓存失效

import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;

public class ConsistentHash<T> {

  private final HashFunction hashFunction;
  private final int numberOfReplicas;
  private final SortedMap<Integer, T> circle =
    new TreeMap<Integer, T>();

  public ConsistentHash(HashFunction hashFunction,
    int numberOfReplicas, Collection<T> nodes) {

    this.hashFunction = hashFunction;
    this.numberOfReplicas = numberOfReplicas;

    for (T node : nodes) {
      add(node);
    }
  }

  public void add(T node) {
    for (int i = 0; i < numberOfReplicas; i++) {
      circle.put(hashFunction.hash(node.toString() + i),
        node);
    }
  }

  public void remove(T node) {
    for (int i = 0; i < numberOfReplicas; i++) {
      circle.remove(hashFunction.hash(node.toString() + i));
    }
  }

  public T get(Object key) {
    if (circle.isEmpty()) {
      return null;
    }
    int hash = hashFunction.hash(key);
    if (!circle.containsKey(hash)) {
      SortedMap<Integer, T> tailMap =
        circle.tailMap(hash);
      hash = tailMap.isEmpty() ?
             circle.firstKey() : tailMap.firstKey();
    }
    return circle.get(hash);
  } 

}

但是当节点个数比较少时，往往会出现数据倾向现象(With a small number of vnodes, different servers could be assigned wildly different numbers of keys.)，所以需要引入虚拟节点(virtual nodes),一般是主机名加编号,数据定位方式不变，只不过命中后要将虚拟节点转换成真实节点(To add the list of nodes to the ring hash, each one is hashed m.replicas times with slightly different names ( 0 node1, 1 node1, 2 node1, …))，Dubbo服务框架中的ketama算法也是采用的这种思想

然而环哈希算法依旧会导致节点不平衡(the load distribution across the nodes can still be uneven),重复hash也会导致内存开销增大。于是Google公司于2014年提出Jump Hash，设计思路是当桶的数量变化时，只需要把一些对象从旧桶移动到新桶，不需要做其它移动

func Hash(key uint64, numBuckets int) int32 {
  var b int64 = -1
  var j int64
  for j < int64(numBuckets) {
    b = j
    key = key*2862933555777941757 + 1
    j = int64(float64(b+1) *
      (float64(int64(1)<<31) / float64((key>>33)+1)))
  }
  return int32(b)
}

然而依旧是不足的，输入是桶数量，输出是对应的序列号(The main limitation is that it only returns an integer in the range 0..numBuckets-1. It doesn’t support arbitrary bucket names),另外只能增加或者删除最后的节点，不支持任意节点删除，也就是说中间节点失效时，失效节点的数据并不会重新平均分配(you can only properly add and remove nodes at the upper end of the range)，所以它适用于分布式主备存储类系统中(These combined make Jump Hash better suited for data storage applications where you can use replication to mitigate node failure)

当然还有其他杰出的算法如Multi-Probe Consistent Hashing、Rendezvous Hashing、Maglev Hashing，具体的大家网上查阅吧

三、Tip

1、推荐一个不会被封的V**，justmysocks1.net，没有服务器，提供了五个域名节点，专属优惠码：BWH26FXH3HIQ 2、有些服务的获取主机方式非常反人类，不像Dubbo中的主机绑定，考虑很多的情况，避免因为无法获取主机导致不可用(具体代码就不贴了)

Register & bind IP address for service provider, can be configured separately. Configuration priority: environment variables -> java system properties -> host property in config file ->/etc/hosts -> default network address -> first available network address