前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >聊聊java中的哪些Map:(一)HashMap(1.8)源码分析

聊聊java中的哪些Map:(一)HashMap(1.8)源码分析

作者头像
冬天里的懒猫
发布2020-08-24 10:54:41
3490
发布2020-08-24 10:54:41
举报

文章目录

- [1.类的结构及重要属性](https://cloud.tencent.com/developer)
    - [1.1 类的基本结构](https://cloud.tencent.com/developer)
    - [1.2 成员变量及常量](https://cloud.tencent.com/developer)
        - [1.2.1 常量](https://cloud.tencent.com/developer)
        - [1.2.2 成员变量](https://cloud.tencent.com/developer)
            - [1.2.2.1 table](https://cloud.tencent.com/developer)
            - [1.2.2.2 entrySet](https://cloud.tencent.com/developer)
            - [1.2.2.3 size](https://cloud.tencent.com/developer)
            - [1.2.2.4 modCount](https://cloud.tencent.com/developer)
            - [1.2.2.5 threshold](https://cloud.tencent.com/developer)
            - [1.2.2.6 loadFactor](https://cloud.tencent.com/developer)
- [1.3 重要的内部类](https://cloud.tencent.com/developer)
    - [1.3.1 Node](https://cloud.tencent.com/developer)
    - [1.3.2 TreeNode](https://cloud.tencent.com/developer)
    - [1.3.3 视图型内部类KeySet、Values、EntrySet](https://cloud.tencent.com/developer)
        - [1.3.3.1 KeySet](https://cloud.tencent.com/developer)
        - [1.3.3.2 Values](https://cloud.tencent.com/developer)
        - [1.3.3.2 EntrySet](https://cloud.tencent.com/developer)
    - [1.3.4 迭代器HashIterator及并行迭代器HashMapSpliterator](https://cloud.tencent.com/developer)
- [2.基本原理](https://cloud.tencent.com/developer)
    - [2.1 HashMap的基本结构](https://cloud.tencent.com/developer)
    - [2.2 HashMap中的位运算操作](https://cloud.tencent.com/developer)
        - [2.2.1 扩容](https://cloud.tencent.com/developer)
        - [2.2.2 bucket计算](https://cloud.tencent.com/developer)
        - [2.2.3 split](https://cloud.tencent.com/developer)
        - [2.2.4 为什么HashMap的DEFAULT\_INITIAL\_CAPACITY为16](https://cloud.tencent.com/developer)
- [3.构造函数](https://cloud.tencent.com/developer)
    - [3.1 HashMap()](https://cloud.tencent.com/developer)
    - [3.2 HashMap(int initialCapacity)](https://cloud.tencent.com/developer)
    - [3.3 HashMap(int initialCapacity, float loadFactor)](https://cloud.tencent.com/developer)
    - [3.4 HashMap(Map<? extends K, ? extends V> m)](https://cloud.tencent.com/developer)
- [4.重要方法](https://cloud.tencent.com/developer)
    - [4.1 tableSizeFor](https://cloud.tencent.com/developer)
    - [4.2 get](https://cloud.tencent.com/developer)
    - [4.3 hash](https://cloud.tencent.com/developer)
    - [4.4 put](https://cloud.tencent.com/developer)
    - [4.5 resize](https://cloud.tencent.com/developer)
    - [4.6 treeifyBin](https://cloud.tencent.com/developer)
    - [4.7 remove](https://cloud.tencent.com/developer)
    - [4.8 clear](https://cloud.tencent.com/developer)
    - [4.8 containsValue](https://cloud.tencent.com/developer)
- [5.总结](https://cloud.tencent.com/developer)

无论是大厂还是不知名的小公司,HashMap都是一个绕不开的话题。基本上,如果通过HashMap能聊半小时以上,基本offer就没什么大碍了。现在我们也看看HashMap的源码,看看为什么这么被面试官待见。

1.类的结构及重要属性

1.1 类的基本结构

HashMap的本质还是hash表,在前面解决哈希冲突的常用方法分析一文中分析了对于hash表,hash冲突之后的解决方法。主要有开放定址法、拉链法、再哈希法、公共溢出区等方法。而且再讨论ThreadLocal的时候也讨论了ThreadLocalMap实际上是采用开放定址法中的线性探查法来解决hash冲突的,详见java中的reference(四): WeakReference的应用–ThreadLocal源码分析一文。实际上HashMap则是采用拉链法解决哈希冲突最具代表性且最广泛被使用的应用。在1.7及之前版本完全是采用链表来解决,而1.8版本中则比较复杂,在链表的基础上,当链表达到一定长度的时候会转换为红黑树。

我们先通过idea的Dragrams功能来看一看HashMap的基本结构:

可以看到,HashMap继承了AbstractMap,并实现了Map、Cloneable、Serializable接口。

源码如下:

/**
 * Hash table based implementation of the <tt>Map</tt> interface.  This
 * implementation provides all of the optional map operations, and permits
 * <tt>null</tt> values and the <tt>null</tt> key.  (The <tt>HashMap</tt>
 * class is roughly equivalent to <tt>Hashtable</tt>, except that it is
 * unsynchronized and permits nulls.)  This class makes no guarantees as to
 * the order of the map; in particular, it does not guarantee that the order
 * will remain constant over time.
 *
 * <p>This implementation provides constant-time performance for the basic
 * operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
 * disperses the elements properly among the buckets.  Iteration over
 * collection views requires time proportional to the "capacity" of the
 * <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
 * of key-value mappings).  Thus, it's very important not to set the initial
 * capacity too high (or the load factor too low) if iteration performance is
 * important.
 *
 * <p>An instance of <tt>HashMap</tt> has two parameters that affect its
 * performance: <i>initial capacity</i> and <i>load factor</i>.  The
 * <i>capacity</i> is the number of buckets in the hash table, and the initial
 * capacity is simply the capacity at the time the hash table is created.  The
 * <i>load factor</i> is a measure of how full the hash table is allowed to
 * get before its capacity is automatically increased.  When the number of
 * entries in the hash table exceeds the product of the load factor and the
 * current capacity, the hash table is <i>rehashed</i> (that is, internal data
 * structures are rebuilt) so that the hash table has approximately twice the
 * number of buckets.
 *
 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the
 * space overhead but increase the lookup cost (reflected in most of
 * the operations of the <tt>HashMap</tt> class, including
 * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.
 *
 * <p>If many mappings are to be stored in a <tt>HashMap</tt>
 * instance, creating it with a sufficiently large capacity will allow
 * the mappings to be stored more efficiently than letting it perform
 * automatic rehashing as needed to grow the table.  Note that using
 * many keys with the same {@code hashCode()} is a sure way to slow
 * down performance of any hash table. To ameliorate impact, when keys
 * are {@link Comparable}, this class may use comparison order among
 * keys to help break ties.
 *
 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access a hash map concurrently, and at least one of
 * the threads modifies the map structurally, it <i>must</i> be
 * synchronized externally.  (A structural modification is any operation
 * that adds or deletes one or more mappings; merely changing the value
 * associated with a key that an instance already contains is not a
 * structural modification.)  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the map.
 *
 * If no such object exists, the map should be "wrapped" using the
 * {@link Collections#synchronizedMap Collections.synchronizedMap}
 * method.  This is best done at creation time, to prevent accidental
 * unsynchronized access to the map:<pre>
 *   Map m = Collections.synchronizedMap(new HashMap(...));</pre>
 *
 * <p>The iterators returned by all of this class's "collection view methods"
 * are <i>fail-fast</i>: if the map is structurally modified at any time after
 * the iterator is created, in any way except through the iterator's own
 * <tt>remove</tt> method, the iterator will throw a
 * {@link ConcurrentModificationException}.  Thus, in the face of concurrent
 * modification, the iterator fails quickly and cleanly, rather than risking
 * arbitrary, non-deterministic behavior at an undetermined time in the
 * future.
 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness: <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>
 *
 * <p>This class is a member of the
 * <a href="{@docRoot}/../technotes/guides/collections/index.html">
 * Java Collections Framework</a>.
 *
 * @param <K> the type of keys maintained by this map
 * @param <V> the type of mapped values
 *
 * @author  Doug Lea
 * @author  Josh Bloch
 * @author  Arthur van Hoff
 * @author  Neal Gafter
 * @see     Object#hashCode()
 * @see     Collection
 * @see     Map
 * @see     TreeMap
 * @see     Hashtable
 * @since   1.2
 */
public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {
    
    }

注释大意为:

HashMap实现了基本的Map接口,这个实现提供了map的所有操作方法。允许null的key和value。除开HashMap的非同步性和允许空key和value之外,HashMap与HashTable等价。但是HashMap并不保证顺序性。

这个实现提供了恒定性能的get和put方法,假设HashMap将所有的元素均匀的分散在buckets中,对这个容器进行迭代或者汇总的时间与容器的容量成正比,因此,不要将初始化的容量设置得过大或者负载因子设置得过低对于迭代的性能是非常重要的。

HashMap的一个构造方法有两个参数,初始化容量和负载因子。初始容量是hash表创建时初始buckets的数量,负载因子则是哈希表允许其内部元素有多满的度量,在其容量扩容之前获取,当哈希表的size超过了负载因子和当前容量的乘积,这个哈希表通过rehash方式进行重建扩容,大概是之前buckets数量的两倍。

通常情况下,负载因子的大小是0.75,这是对时间和空间的权衡。数值越高,则空间的开销越小,但是检索的时间成本就越高。包括get和put方法。设置初始容量的时候,需要考虑map中预期的条目数和其负载因子,以便最小化rehash操作的次数,如果初始化的容量大于条目数除以负载因子,则不会发生rehash操作。

如果有许多数据要存储在HashMap的实例中,那么足够大的初始化容量来创建这个哈希表将比让这个哈希表随着元素的添加而自动扩容更加有效率。注意,如果有多个元素的hashcode相同,这会导致hashTable的性能降低。为了改善这个影响,可以将Comparable作为key,使用这些元素之间的比较顺序来避免这个问题。

注意HashMap是非同步的,如果多线程并发环境下,最后的这个线程如果修改了HashMap的结构,它必须在外部加上同步方法。结构修改是指任何添加或者删除操作,仅仅是改变value则不是属于结构修改。这通常是在自然封装的映射对象上同步。

如果没有这些对象,那么请使用Collections.synchronizedMap方法。最好是在对象创建完成之后访问,以防止不同步的访问方法。

Map m = Collections.synchronizedMap(new HashMap(...));

迭代器返回了这个类的所有集合视图方法,这个方法也是fail-fast的,在迭代器创建之后,如果Map的结构被修改,将会抛出ConcurrentModificationException。因此,在并行修改之前,迭代器最好死fail-fast,而不是冒险执行哪些未确定的行为。

注意,迭代器快速失败行为并不能得到保证,因为通常而言,在非同步的并发修改时不可能做出任何确定性的保证。fail-fast机制将最大努力的抛出ConcurrentModificationException,但是你的程序需要依赖这个异常来保证其正确性,那就错了,fail-fast机制只能用来检测bug。

1.2 成员变量及常量

1.2.1 常量

在HashMap的源码中,首先需要知道的就是定义在HashMap中的这些常量,对于HashMap的作用非常关键。见如下源码:

 private static final long serialVersionUID = 362498820763181265L;

    /*
     * Implementation notes.
     *
     * This map usually acts as a binned (bucketed) hash table, but
     * when bins get too large, they are transformed into bins of
     * TreeNodes, each structured similarly to those in
     * java.util.TreeMap. Most methods try to use normal bins, but
     * relay to TreeNode methods when applicable (simply by checking
     * instanceof a node).  Bins of TreeNodes may be traversed and
     * used like any others, but additionally support faster lookup
     * when overpopulated. However, since the vast majority of bins in
     * normal use are not overpopulated, checking for existence of
     * tree bins may be delayed in the course of table methods.
     *
     * Tree bins (i.e., bins whose elements are all TreeNodes) are
     * ordered primarily by hashCode, but in the case of ties, if two
     * elements are of the same "class C implements Comparable<C>",
     * type then their compareTo method is used for ordering. (We
     * conservatively check generic types via reflection to validate
     * this -- see method comparableClassFor).  The added complexity
     * of tree bins is worthwhile in providing worst-case O(log n)
     * operations when keys either have distinct hashes or are
     * orderable, Thus, performance degrades gracefully under
     * accidental or malicious usages in which hashCode() methods
     * return values that are poorly distributed, as well as those in
     * which many keys share a hashCode, so long as they are also
     * Comparable. (If neither of these apply, we may waste about a
     * factor of two in time and space compared to taking no
     * precautions. But the only known cases stem from poor user
     * programming practices that are already so slow that this makes
     * little difference.)
     *
     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     *
     * The root of a tree bin is normally its first node.  However,
     * sometimes (currently only upon Iterator.remove), the root might
     * be elsewhere, but can be recovered following parent links
     * (method TreeNode.root()).
     *
     * All applicable internal methods accept a hash code as an
     * argument (as normally supplied from a public method), allowing
     * them to call each other without recomputing user hashCodes.
     * Most internal methods also accept a "tab" argument, that is
     * normally the current table, but may be a new or old one when
     * resizing or converting.
     *
     * When bin lists are treeified, split, or untreeified, we keep
     * them in the same relative access/traversal order (i.e., field
     * Node.next) to better preserve locality, and to slightly
     * simplify handling of splits and traversals that invoke
     * iterator.remove. When using comparators on insertion, to keep a
     * total ordering (or as close as is required here) across
     * rebalancings, we compare classes and identityHashCodes as
     * tie-breakers.
     *
     * The use and transitions among plain vs tree modes is
     * complicated by the existence of subclass LinkedHashMap. See
     * below for hook methods defined to be invoked upon insertion,
     * removal and access that allow LinkedHashMap internals to
     * otherwise remain independent of these mechanics. (This also
     * requires that a map instance be passed to some utility methods
     * that may create new nodes.)
     *
     * The concurrent-programming-like SSA-based coding style helps
     * avoid aliasing errors amid all of the twisty pointer operations.
     */

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

这里面有一段非常重要的注释,其大意如下:

实现的注意事项,这个map通常是由一个个bin(bucket)组成的哈希表。当当个bin的长度变大之后,将会转换为红黑树的TreeNode,这比较类似于java.util.TreeMap的结构,大多数方法都使用bin的数据结构(可以用instanceof 来判断是否是一个Node)。树化的bin也支持普通bin的操作,但是树化之后,查询性能会高于传统的bin,然而,由于正常情况下,绝大多数的bucket都不会有过多的数量,因此,检查是否存在树化的bin方法可能会被延迟。

树化的bins,其每个元素都是TreeNode,根据hashcode进行排序,但是如果在hashcode相同的情况下,如果有两个元素具有相同的Comparable实现,则他们会通过compareTo方法进行排序。我们通过反射保守的来验证泛型方法。当key具有不同的hash值或者可排序的时候,增加树化容器的复杂性,提供了最坏为O(log n)的时间复杂度,因此,在意外或者恶意使用的情况下,hashcode的值要是离散程度不够的话,性能会优雅的下降。如果有许多key的hashcode相同,只要他们是可以比较的,hashMap也是支持的。如果这两种方式都不适用,那么不采取预防措施相比,我们可能会在时间和空间上浪费大约两倍的时间,这种情况的唯一的原因是由于不良的编码所致。

因为TreeNode节点的大小大约是常规节点的两倍,所以我们仅在bins足以完全容纳的时候才使用。请参阅TREEIFY_THRESHOLD,当每个树化的bin的数量变小时,他们又会转换为普通的bin。即变成链表。在使用离散性能很好的hashCodes方法下,很少会造成树化的hashMap。在随机的hashCodes方法中,bin节点的频率服从泊松分布,(http://en.wikipedia.org/wiki/Poisson_distribution)由于默认的负载因子为0.75,平均参数约为0.5,尽管由于调整粒度的差异很大,在忽略方差的情况下,列表的大小k的预期出现的次数是:

(exp(-0.5)* pow(0.5,k)/ factorial(k))

计算的值如下表:

次数

概率

0

0.60653066

1

0.30326533

2

0.07581633

3

0.01263606

4

0.00157952

5

0.00015795

6

0.00001316

7

0.00000094

8

0.00000006

之后都低于千万分之一。

树化的bin的根通常是它的第一个节点,然而,有时候根可能会在其他地方(目前只在iterator.remove上出现),但是可以恢复到父节点,通过TreeNode.root()方法执行。

所有使用的内部方法都接受一个hashcode参数,(通常由公共的方法提供,Object对象就有),允许无须重新计算的用户hashcode既可互相调用。大多数的内部方法也而已接受一个tab参数,通常是当前的表,但也可能是新表或者旧表,在调整大小的时候或者转换的情况下。

当bin列表被树化,拆分或者未被树化时,我们将其保持在相同的相对访问/遍历顺序,即Node的next属性中。并略微简化的对调用iterator.remove的拆分和遍历处理。当在插入时使用comparators时,为了在总体排序下重新平衡,我们将类和identityHashCodes进行最终的比较。

其子类LinkedHashMap在链表和红黑树之间的转换变得非常复杂,请参考下面的hook方法,这些钩子函数定义在插入、删除、和访问时被调用,这些方法允许LinkedHashMap内部保持独立于当前的机制。这还要求将map实例传递给一些可能创建新节点的实用程序方法。

类似于并行编程的基于SSL的编码风格很有帮组,能避免在所有扭曲的指针操作中出现混叠错误。

以上是这一段注释的大意,非常重要,我们从中可以了解到,为什么HashMap的bin在长度大于8之后会被树化。其他的一些重要的常量见下表:

常量名

取值

说明

DEFAULT_INITIAL_CAPACITY

1<<4=16

默认的初始化容量,必须是2的倍数。

MAXIMUM_CAPACITY

1<<30=1073741824

最大容量,当隐式指定较高的值的时候,使用任何带参数的构造函数执行。

DEFAULT_LOAD_FACTOR

0.75

负载因子,在构造函数中如果没有指定,则使用这个值.

TREEIFY_THRESHOLD

8

链表转换为红黑树的阈值,这个数值必须大于2且至少为8,才能确保在收缩的时候转换为链表。

UNTREEIFY_THRESHOLD

6

由红黑树转换为链表的阈值,当红黑树随着remove操作收缩的时候,达到这个值则转换为链表。这个值必须小于TREEIFY_THRESHOLd

MIN_TREEIFY_CAPACITY

64

容器可能被树化的最小表容量,否则,当容器中的节点增加的时候,会调整容器table的大小。至少应该为4的TREEIFY_THRESHOLD,以避免treesize和treeifcation冲突。

1.2.2 成员变量

主要的成员变量如下:

 /* ---------------- Fields -------------- */

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
transient Node<K,V>[] table;

/**
 * Holds cached entrySet(). Note that AbstractMap fields are used
 * for keySet() and values().
 */
transient Set<Map.Entry<K,V>> entrySet;

/**
 * The number of key-value mappings contained in this map.
 */
transient int size;

/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.  (See ConcurrentModificationException).
 */
transient int modCount;

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * @serial
 */
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;

/**
 * The load factor for the hash table.
 *
 * @serial
 */
final float loadFactor;
1.2.2.1 table

transient Node<K,V>[] table;

table是一个Nodes数组,在第一次使用的时候初始化,并调整为其必要的大小,当分配的时候,其长度总是2的幂。在某些操作中,我们也允许长度为0,当前不需要这种机制。

另外需要注意的是,table采用transient修饰。其本身实现了特点的序列化操作。而不是传统的对象序列化方式。

1.2.2.2 entrySet

transient Set<Map.Entry<K,V>> entrySet;

实际上是个Map中元素的缓存,注意,抽象类AbstractMap使用的是keySet()和values()。

此时使用的是Map.Entry,这是一个接口,实际上Node就是这个接口的一个实现类。

1.2.2.3 size

transient int size

hashMap中总元素的数目。

1.2.2.4 modCount

transient int modCount

这个HashMap在结构上倍修改的次数,结构修改是指改变Map的数量或者修改内部结构,这样可以让建立在HashMap集合上的视图的迭代器快速故障,返回ConcurrentModificationException。

1.2.2.5 threshold

int threshold;

调整table大小的笑一个阈值。其结果为当前的容量*负载因子。

如果table没有分配,则这个阈值为0或者为DEFAULT_INITIAL_CAPACITY。

1.2.2.6 loadFactor

final float loadFactor;

hashMap的负载因子,只能被赋值一次。

1.3 重要的内部类

仔细阅读java的HashMap源代码,我们可以发现其内部有很多内部类。在之前阅读其他源码的时候也会存在这种情况。很多类虽然与外部的类重名,但是在其内部采用另外一种方式来实现。我们来分别进行分析。

1.3.1 Node

类Node实现了了Map.Entry接口。这个接口中定义了一系列的方法,如getKey、getValue、setValue、equals、hashCode等方法。还实现了比较器Comparator。

这是一个基本的bin的node节点,TreeNode是其子类,在LinkedHashMap中,其Entry是Node的子类。

/**
 * Basic hash bin node, used for most entries.  (See below for
 * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 */
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

Node是构成bucket中的链表的基本元素,其主要的属性有key和value两个泛型类型的成员变量。然后由于是链表结构,其还维护了一个next的指针。指向其下一个元素。

需要注意的是其hashcode方法:

Objects.hashCode(key) ^ Objects.hashCode(value)。

也就是说,如果key和value为同一对象的话,Node的hashcode为0。

另外还实现了equals方法,当二者的key和value都相等或者equalse的方法为true的时候返回true。

1.3.2 TreeNode

TreeNode则是hashMap树化之后,组成树的基本节点。需要注意的是,TreeNode继承了LiknedHashMap.Entry

,LinkedHashMap.Entry又继承了Node。由于TreeNode比较复杂,关于TreeNode的操作,我们在后面详细介绍。

TreeNode的继承结构如下:

1.3.3 视图型内部类KeySet、Values、EntrySet

在HashMap中存在一系列的试图型的内部类,如KeySet、Values、EntrySet。

1.3.3.1 KeySet

HashMap提供了一个返回所有key的视图。其继承了AbstractSet,其中的元素还是Map.Entry<K,V>。实际上就是上面哈希表中的Node。或者TreeNode。

/**
 * Returns a {@link Set} view of the keys contained in this map.
 * The set is backed by the map, so changes to the map are
 * reflected in the set, and vice-versa.  If the map is modified
 * while an iteration over the set is in progress (except through
 * the iterator's own <tt>remove</tt> operation), the results of
 * the iteration are undefined.  The set supports element removal,
 * which removes the corresponding mapping from the map, via the
 * <tt>Iterator.remove</tt>, <tt>Set.remove</tt>,
 * <tt>removeAll</tt>, <tt>retainAll</tt>, and <tt>clear</tt>
 * operations.  It does not support the <tt>add</tt> or <tt>addAll</tt>
 * operations.
 *
 * @return a set view of the keys contained in this map
 */
public Set<K> keySet() {
    Set<K> ks = keySet;
    if (ks == null) {
        ks = new KeySet();
        keySet = ks;
    }
    return ks;
}

final class KeySet extends AbstractSet<K> {
    public final int size()                 { return size; }
    public final void clear()               { HashMap.this.clear(); }
    public final Iterator<K> iterator()     { return new KeyIterator(); }
    public final boolean contains(Object o) { return containsKey(o); }
    public final boolean remove(Object key) {
        return removeNode(hash(key), key, null, false, true) != null;
    }
    public final Spliterator<K> spliterator() {
        return new KeySpliterator<>(HashMap.this, 0, -1, 0, 0);
    }
    public final void forEach(Consumer<? super K> action) {
        Node<K,V>[] tab;
        if (action == null)
            throw new NullPointerException();
        if (size > 0 && (tab = table) != null) {
            int mc = modCount;
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next)
                    action.accept(e.key);
            }
            if (modCount != mc)
                throw new ConcurrentModificationException();
        }
    }
}

其注释大意为,提供了一个包含map中全部key的视图,需要注意的是,这里仅仅是一个视图,对集合的任何操作都会反应到视图中,同理,在视图中的任何操作也会反馈到Map上。可以在视图中对元素进行删除等操作,具体支持的操作有Iterator.remove、Set.remove、removeAll、retainAll和clear操作。但是并不支持add和addAll操作。

实际上通过源码可以看到,支持的这些操作只是在类中对Map本身属性或者方法的封装。

我们对KeySet的最基本的操作就是通过keySet获取一个迭代器,之后对其中的key进行迭代。通过源代码可以发现,如果调用KeySet.contains和调用EntrySet.contains实际上都是在对哈希表中的table进行操作。只是在返回的时候在forEach中的accept方法中只传入了key:

 action.accept(e.key);

这是keySet与valueSet、EntrySet最大的区别。

因此我们在遍历HashMap的时候直接通过EntrySet就能完成。而不是很多人认为的需要先遍历Key再get。

1.3.3.2 Values

Values与keySet同理,只是forEach方法中的accept不同。

/**
 * Returns a {@link Collection} view of the values contained in this map.
 * The collection is backed by the map, so changes to the map are
 * reflected in the collection, and vice-versa.  If the map is
 * modified while an iteration over the collection is in progress
 * (except through the iterator's own <tt>remove</tt> operation),
 * the results of the iteration are undefined.  The collection
 * supports element removal, which removes the corresponding
 * mapping from the map, via the <tt>Iterator.remove</tt>,
 * <tt>Collection.remove</tt>, <tt>removeAll</tt>,
 * <tt>retainAll</tt> and <tt>clear</tt> operations.  It does not
 * support the <tt>add</tt> or <tt>addAll</tt> operations.
 *
 * @return a view of the values contained in this map
 */
public Collection<V> values() {
    Collection<V> vs = values;
    if (vs == null) {
        vs = new Values();
        values = vs;
    }
    return vs;
}

final class Values extends AbstractCollection<V> {
    public final int size()                 { return size; }
    public final void clear()               { HashMap.this.clear(); }
    public final Iterator<V> iterator()     { return new ValueIterator(); }
    public final boolean contains(Object o) { return containsValue(o); }
    public final Spliterator<V> spliterator() {
        return new ValueSpliterator<>(HashMap.this, 0, -1, 0, 0);
    }
    public final void forEach(Consumer<? super V> action) {
        Node<K,V>[] tab;
        if (action == null)
            throw new NullPointerException();
        if (size > 0 && (tab = table) != null) {
            int mc = modCount;
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next)
                    action.accept(e.value);
            }
            if (modCount != mc)
                throw new ConcurrentModificationException();
        }
    }
}

我们可以看到其中forEach方法中:

 action.accept(e.value);

此时就返回了value。需要注意的是values没用remove方法。

1.3.3.2 EntrySet

其原理也与keySe和Values相同:

/**
 * Returns a {@link Set} view of the mappings contained in this map.
 * The set is backed by the map, so changes to the map are
 * reflected in the set, and vice-versa.  If the map is modified
 * while an iteration over the set is in progress (except through
 * the iterator's own <tt>remove</tt> operation, or through the
 * <tt>setValue</tt> operation on a map entry returned by the
 * iterator) the results of the iteration are undefined.  The set
 * supports element removal, which removes the corresponding
 * mapping from the map, via the <tt>Iterator.remove</tt>,
 * <tt>Set.remove</tt>, <tt>removeAll</tt>, <tt>retainAll</tt> and
 * <tt>clear</tt> operations.  It does not support the
 * <tt>add</tt> or <tt>addAll</tt> operations.
 *
 * @return a set view of the mappings contained in this map
 */
public Set<Map.Entry<K,V>> entrySet() {
    Set<Map.Entry<K,V>> es;
    return (es = entrySet) == null ? (entrySet = new EntrySet()) : es;
}

final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
    public final int size()                 { return size; }
    public final void clear()               { HashMap.this.clear(); }
    public final Iterator<Map.Entry<K,V>> iterator() {
        return new EntryIterator();
    }
    public final boolean contains(Object o) {
        if (!(o instanceof Map.Entry))
            return false;
        Map.Entry<?,?> e = (Map.Entry<?,?>) o;
        Object key = e.getKey();
        Node<K,V> candidate = getNode(hash(key), key);
        return candidate != null && candidate.equals(e);
    }
    public final boolean remove(Object o) {
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>) o;
            Object key = e.getKey();
            Object value = e.getValue();
            return removeNode(hash(key), key, value, true, true) != null;
        }
        return false;
    }
    public final Spliterator<Map.Entry<K,V>> spliterator() {
        return new EntrySpliterator<>(HashMap.this, 0, -1, 0, 0);
    }
    public final void forEach(Consumer<? super Map.Entry<K,V>> action) {
        Node<K,V>[] tab;
        if (action == null)
            throw new NullPointerException();
        if (size > 0 && (tab = table) != null) {
            int mc = modCount;
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next)
                    action.accept(e);
            }
            if (modCount != mc)
                throw new ConcurrentModificationException();
        }
    }
}

可以通过entry进行remove。forEach方法中:

action.accept(e);

此时是整个对象。

1.3.4 迭代器HashIterator及并行迭代器HashMapSpliterator

hashMap中的剩余内部类都是与迭代器相关的,再单线程模式下,我们会使用Iterator,在前面说到,HashMap实际上是将内部的table分成了KeySet、Values、EntrySet等三部分视图。那么就需要KeyIterator、ValueIterator和EntryIterator配合一起进行迭代。考虑到HashMap的广泛使用场景,如果要加快遍历迭代的速度,可能会在多线程下进行,因此HashMap内部还提供了支持并行的迭代器KeySpliterator、ValueSpliterator、EntrySpliterator等。

具体相关内容将在后续文章详细分析,限于篇幅限制,本文重点还是在HashMap基本的原理。

2.基本原理

2.1 HashMap的基本结构

通过对上面第一部分的了解,我们知道了HashMap的一些基本的概念和基本的操作。在HashMap中,有如下几个概念是需要理解的:

  • bucket:HashMap中数组对应的索引位置,或者称为槽位。实际上就是数组table的每一项元素。见下图,在HashMap中,根据每个Node的key的hashcode,再与table的size取模,计算出对应的bucket。
  • bin : 再HashMap中,当有多个元素的key都计算到同一个bucket之后,那么将通过链表或者红黑树的方式组合取来。这个链表/红黑树就被称为一个bin。

上图仅仅表示HashMap的基本构成。红黑树和HashMap的bucket总数不具备现实中的参考意义。

通过上图我们可以知道,HashMap实际上就是一个内部由Node组成的数组加链表/红黑树结构。

当链表的长度大于或者等于8的时候,同时HashMap的size大于等于64的时候,入果此时没有触发HashMap扩容,那么这个bin将由链表变成红黑树。链表转红黑树的条件必须要注意,不一定为8就会直接转。可能会导致table扩容。而且,再size小于64的时候,可能会出现bin的长度大于8的情况。

如下代码所示:

public static void main(String[] args) {
	HashMap map = new HashMap();
	map.put(new CustomerKey(1),1);
	map.put(new CustomerKey(17),17);
	map.put(new CustomerKey(33),33);
	map.put(new CustomerKey(49),49);
	map.put(new CustomerKey(65),65);
	map.put(new CustomerKey(81),81);
	map.put(new CustomerKey(97),97);
	map.put(new CustomerKey(113),113);
	map.put(new CustomerKey(129),129);
	map.size();



}

private static class CustomerKey{
	int key;

	public CustomerKey(int key) {
		this.key = key;
	}

	@Override
	public int hashCode() {
		return 1;
	}


}

我们将所有元素的hashcode都变成1,理论上,当达到8个元素的时候,根据以往我们的认知,hashMap中的Node将会转变为红黑树。实际上,结果任然是一个很长的链表:

所以我们必须正确理解Hashmap链表转为红黑树的条件。一定是当链表长度大于等8且size没有触发扩容或者size低于64。

(bin > 8)&&(size < 64 || size < threshold )

2.2 HashMap中的位运算操作

2.2.1 扩容

在HashMap中,当size触发threshold之后,会进行扩容,扩容是采用的位移计算:

newCap = oldCap << 1

也就是说,HashMap的size大小通常是2的幂。由于初始容量为16,每扩容一次,容量就增加2倍。需要注意的是,HashMap没有提供缩容机制。这个Hashmap只能扩大,不能缩小。

2.2.2 bucket计算

HashMap的另外一个很重要的地方就是如何计算bucket,我们知道,在常规情况下,我们可以通过取模%来实现。但是熟悉计算机底层的人都知道,计算机对于位运算的操作是最快的。在计算机系统中,当b为2的n次幂的时候:

//当b为2的n次幂的时候
a % b = a & ( b - 1 ) 

那么hashMap既然其初始长度为16而且每次都以左移扩展。那么显然符合上述规律。我们计算bucket的时候的方法如下:

first = tab[(n - 1) & hash]

first表示hash计算后的bucket的节点。通过(n-1)&hash很快就计算出了bucket。

上述原理可以通过如下值计算,如189和205的hash值,按n=16来计算:

计算的结果都是13。

2.2.3 split

我们知道,hashmap会扩容,那么扩容之后,原来的元素怎么分配呢?如果没有什么规律,比如扩容每次加1,从8扩容到9,显然没有任何规律可言,全部节点重新计算一遍取模。但是这种方式对于hashMap来说显然是低效的。在HashMap中,每次都是以2的倍数扩容。也就是说,每扩容一次,容量增加2倍。这就有规律可循了。肯定会将原有的节点分为两部分,一部分还在原有的bucket,而分出来的部分,将会是原来的索引加上新增的长度oldsize+index。即将原来的元素分为高低位两部分,低位继续在原有的bucket,而高位则是扩容出来的新位置。

这个区分高低位的算法非常巧妙:

if ((e.hash & bit) == 0)

这里bit是原有的size大小。

我们用之前的例子进行说明,扩容之后为32 n=31计算如下:

我们可以看出,实际上就是在原来的基础上增加了1位,所以高位的索引位置很容易就得出了是13+16=29。之后,这两个数字就是在增加的这位上的反应不一样导致会分配到不同的索引。因此实际上我们只用关注这个新增加的位即可:

size位16:

新增加的位和全部数据计算之后要么为0,要么不为0,等于16。考虑到代码的通用性,实际上不管怎么扩容,只要为0就说明保持低位不变。否则就将该节点放到高位。

可见Hashmap不愧为大神级的代码,在一开始就考虑到了扩容、索引、拆分的效率问题。

2.2.4 为什么HashMap的DEFAULT_INITIAL_CAPACITY为16

这也是在面试中经常会碰到的一个问题。关于这个问题,实际上是对计算机底层基本原理的考察,是一个非常有深度的问题。

我们知道,HashMap中为了性能的提升,采用了很多位运算来实现,如扩容、索引、拆分等。正如上面三部分所示。因此这就要求hashmap的初始化长度为2的幂。如果不是,那么第一次扩容之后split和bucket就会出问题。

那么其初始长度必须是 2、4、8、16、32等等。这些2的幂来构成。

但是为什么是16呢?这个没有资料来说明,个人觉得,应该是个经验值。如果这个值太小,那么一上来就会扩容,如果太大则会造成空间浪费,16显然是个中间的数字。

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

注释中也说了,这个数字必须是2的幂。

3.构造函数

现在对HashMap的构造函数进行分析:

3.1 HashMap()

我们使用最多的就是这个无参的构造函数:

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

它只会设置默认的负载因子为0.75.默认容量为16。

3.2 HashMap(int initialCapacity)

Hashmap还提供了指定初始化容量的构造函数:

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and the default load factor (0.75).
 *
 * @param  initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

其默认的负载因子为0.75。

3.3 HashMap(int initialCapacity, float loadFactor)

可以同时指定初始化容量和负载因子:

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

对initialCapacity的有效范围进行判断,其范围位于0-MAXIMUM_CAPACITY之间,否则,则会抛出IllegalArgumentException异常。loadFactor不能小于0或者不是一个数字。之后根据数量计算tableSizeFor。

3.4 HashMap(Map<? extends K, ? extends V> m)

HashMap也可以将另外一个HashMap直接变成一个新的HashMap。

/**
 * Constructs a new <tt>HashMap</tt> with the same mappings as the
 * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
 * default load factor (0.75) and an initial capacity sufficient to
 * hold the mappings in the specified <tt>Map</tt>.
 *
 * @param   m the map whose mappings are to be placed in this map
 * @throws  NullPointerException if the specified map is null
 */
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

这个构造函数底层调用的是putMapEntries。其可以将Map的Entrys全部插入。

/**
 * Implements Map.putAll and Map constructor.
 *
 * @param m the map
 * @param evict false when initially constructing this map, else
 * true (relayed to method afterNodeInsertion).
 */
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    //当插入的map不为空的时候
    if (s > 0) {
       //如果table为空
        if (table == null) { // pre-size
            //计算负载因子
            float ft = ((float)s / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
                threshold = tableSizeFor(t);
            //计算出阈值。
        }
        //如果插入的size大于阈值则扩容。
        else if (s > threshold)
            resize();
        //遍历插入
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

同时putMapEntries也是putAll内部的实现方法。也就是说putAll与通过new一个构造函数等价。

4.重要方法

4.1 tableSizeFor

这个方法的作用是找到传入的cap的最小2次幂。

static final int tableSizeFor(int cap) {
	int n = cap - 1;
	n |= n >>> 1;
	n |= n >>> 2;
	n |= n >>> 4;
	n |= n >>> 8;
	n |= n >>> 16;
	return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

代码比较难懂,我们以10为例,查看整个计算过程:

可以看到计算结果为16。

我们再看一个最大值的计算:

通过这个计算过程,我们可以有规律的发现,实际上就是将当前最高位和后面所有位都变成1的过程。之后再加上1。

4.2 get

get是HashMap中最基本的方法。

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it's also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

get方法中隐藏的有两个方法,getNode和hash。

/**
 * Implements Map.get and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
       //通过位运算计算bucket位置,这一点非常重要
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            //如果hash相同,则判断key是否相等
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        //如果不等则遍历链表或者红黑树
        if ((e = first.next) != null) {
            //首先判断是否为红黑树,红黑树则用红黑树的检索方法
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            //遍历链表
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

get方法中核心的就是根据hash值,从链表或者红黑树中搜索结果。如果为红黑树,则通过红黑树的方式查找。因为红黑树是颗排序的树,红黑树的效率会比链表全表扫描有显著提高。

4.3 hash

我们还需要注意其hash方法:

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

其注释大意为,计算hashcode,将较高的位扩展到较低的位,因为hash表的长度都是2的幂,而我们在之前bucket索引计算的时候可以发现,实际上大于size长度的高位,根本没有参与计算。因此,我们需要一个折衷的办法,将高位部分也能参与到计算中来。这样可以使得数据更加平均的分布在系统中。

我们来看看为什么是这样。在之前的例子中,我们演示了index的方法。

我们可以看到,在上述例子中,如果某一组数据,其变化主要在高位,而低位不太变化的话,无论怎么扩容,到会导致计算出的bucket为同一个。这样的数据在现实中绝对存在,但是背离了hashMap想要将数据均匀分布的初衷。之前的计算index的方法,只与低位相关。当n=2时,下标的运算取决于最低位。当n=4时取决于低2位。n=8时取决于最低3位。n=16时取决于最低4位,n=32时取决最低5位。

我们在实际使用的时候,hashmap的bucket很少能达到高位部分,基本上都不会有这么大,那么实际上也就是说,通常情况下,hashmap的key取bucket的算法只与低位有关系。这样势必会造成数据的不平均分布。为了避免这个问题。hashmap又做了再次优化,将最高位与最低位混合通过异或运算尽量多的让每一位都参与其中。

如上图我们看到,这样操作之后,新组成的数字,低位就经过了高位参与计算。得到的数据就会更平均。

这就是为什么需要>>>16的原因。就是通过>>>16取出高位。之后我们再看为什么要用^而不是&和|呢。因为异或操作得到的结果的概率是平均的:

不会造成结果的不平均。

由此我们可以看出,hashmap的作者为了提升hashMap的性能,可以说是无所不用其极。也只有都考虑到这些情况,才能写出高效的代码。这让人想到了 Disruptor框架。Disruptor的源码绝对值得一看。

4.4 put

get也是我们使用HashMap最重要的方法之一。

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
   //此处调用的是经过高位混淆的hash方法
    return putVal(hash(key), key, value, false, true);
}

底层使用的是putVal方法。

/**
 * Implements Map.put and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
       //如果table为null则直接扩容,这个地方可以通过new一个大小为0的HashMap验证
        n = (tab = resize()).length;
     //使用&计算bucket的索引,如果为空则当前创建的节点就是根节点
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
       //反之就要进行链表或者红黑树处理
        Node<K,V> e; K k;
        //判断是否相等
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //如果为红黑树则按红黑树的方式处理
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
        //反之则按链表的方式处理
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    //判断是否需要扩容
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 上边已经检查完map中是否存在对应key的Node节点,不存在的新创建节点,这里处理下存在对应key的节点数据
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            // 子类实现方法的话可以进行对应的后置操作
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    //插入之后的平衡操作
    afterNodeInsertion(evict);
    return null;
}

4.5 resize

resize方法对hashMap的table进行扩容。这是一个非常重要的方法,在HashMap中,如果触发了阈值threshold,则会调用resize方法。在实际上,在new HashMap指定容量的时候,table并没有赋值,是空值,只是根据传入的cap计算出了阈值threshold。

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    //oldCap为table的size
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    //判断当oldCap大于0
    if (oldCap > 0) {
       //如果oldCap为最大值,则设置threshold也为最大值。
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        //newCap 为oldCap的二倍。
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            //newThr也增加2倍
            newThr = oldThr << 1; // double threshold
    }
    //如果oldCap为0但是oldThr大于0的话,说明此时为第一次put 将newCap的值改为oldThr
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else { 
         //反之则设置默认值
         // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
   //如果呢问Thr为0 则根据默认值计算
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
     //此时才重新创建了新的数组
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    //如果旧数组不为空则进行copy
    if (oldTab != null) {
      //遍历旧数组
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                //如果只有一个元素,则直接计算bucket在新数组中的index
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
               //反之则进行TreeNode判断,为TreeNode则拆分
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                //反之则按链表遍历拆分
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    //循环遍历链表
                    do {
                        next = e.next;
                        //通过&将链表进行高低位计算
                       //如果结果为0则在低位
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        //反之在高位
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    //判断高低位结果是否为空,并设置到正确的bucket
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        //高位为j+oldCap
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

这里对链表的拆分也采用了高低位计算,e.hash & oldCap。后面在TreeNode部分也有详细的介绍。

4.6 treeifyBin

此方法是将链表转红黑树的方法。

/**
 * Replaces all linked nodes in bin at index for given hash unless
 * table is too small, in which case resizes instead.
 */
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
     //判断是否需要扩容
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    //否则计算槽位
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        //遍历链表,重新替换为TreeNode
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            //调用红黑树的树化方法
            hd.treeify(tab);
    }
}

// For treeifyBin
TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
    return new TreeNode<>(p.hash, p.key, p.value, next);
}

此处将链表先按链表次序转为TreeNode节点,此时的TreeNode节点还是一个与原来相同的链表,只是将元素类型进行了替换。之后再调用TreeNode的树化方法。那么这个新组成的树,同时具有了链表和红黑树的特性。在拆分遍历的时候可以用链表,在查找的时候可以用红黑树。

假定有如下红黑树,注意,此时的具体值为插入序号,仅做为举例参考。不具有实际意义。真实的红黑树可能不是这样。

那么链表实际上在这个红黑树上还存在。通过next指针指向。

4.7 remove

remove方法是将指定的key从HashMap的table中移除。

/**
 * Removes the mapping for the specified key from this map if present.
 *
 * @param  key key whose mapping is to be removed from the map
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V remove(Object key) {
    Node<K,V> e;
    //通过前面混淆的hash方法来确定key的hash值
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

底层调用了removeNode方法:

/**
 * Implements Map.remove and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to match if matchValue, else ignored
 * @param matchValue if true only remove if value is equal
 * @param movable if false do not move other nodes while removing
 * @return the node, or null if none
 */
final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        //此处通过&计算bucket
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        //判断是否相等
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        //如果不等则判断next是否为空
        else if ((e = p.next) != null) {
            //不为空的话判断是否红黑树,红黑树则采用红黑树的方法
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
            //反之遍历链表查找元素
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            //后续操作 
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

由于hashmap不会缩容,因此remove方法相对简单。最多只会判断红黑树是否小于6的时候要转换为链表。

4.8 clear

与前面的方法相比,clear太简单了:

/**
 * Removes all of the mappings from this map.
 * The map will be empty after this call returns.
 */
public void clear() {
    Node<K,V>[] tab;
    modCount++;
    if ((tab = table) != null && size > 0) {
        size = 0;
        for (int i = 0; i < tab.length; ++i)
            tab[i] = null;
    }
}

clear的原理就是将整个table遍历然后变成null,这样会回收整个HashMap,只是table的容量不会再减少。

4.8 containsValue

/**
 * Returns <tt>true</tt> if this map maps one or more keys to the
 * specified value.
 *
 * @param value value whose presence in this map is to be tested
 * @return <tt>true</tt> if this map maps one or more keys to the
 *         specified value
 */
public boolean containsValue(Object value) {
    Node<K,V>[] tab; V v;
    if ((tab = table) != null && size > 0) {
        for (int i = 0; i < tab.length; ++i) {
            for (Node<K,V> e = tab[i]; e != null; e = e.next) {
                if ((v = e.value) == value ||
                    (value != null && value.equals(v)))
                    return true;
            }
        }
    }
    return false;
}

contaninsValue方法也比较简单,只需要按table两层循环遍历就好,没什么特殊的地方。

5.总结

本文全面的对HashMap的源码进行了分析。我们可以看到,再Hashmap中,为了提升HashMap的性能而做的各种努力。在本文的结尾。我们通过问答的方式,再次对本文的重点进行回顾。

  • 1.HashMap的基本结构 这个问题可以参考2.1部分,数组加链表/红黑树。需要注意的是,红黑树实际上还有一层链表结构。如果有面试中遇到此问题,就要从TreeNode的继承结构开始,TreeNode继承了LinkedHashMap.Entry 而LinkedHashMap.Entry又继承了Node,Node再继承了Map.Entry,每层都增加了若干属性。TreeNode的节点大小大约是Node节点的2倍。因此,TreeNode中还有原有的链表关系,这个再split的时候非常有用。
  • 2.HashMap为什么初始化的大小为16 这是阿里面试的一个重量级面试题,我们在前面第二部分详细有过说明。再此简单回顾,由于HashMap为了进一步提升性能,大量的采用了位运算,这体现在扩容、拆分、以及索引计算的过程中。因此,这就要求实际的长度必须为2的幂。HashMap实际上指定长度的时候,其table并不会倍创建,而是在resize的过程中,根据阈值进行计算的。那么满足2的幂的值只有2、4、8、16、32等,太大则浪费空间,太小则会导致程序不断扩容。因此16是个折衷的数字。需要注意的是,HashMap的数组大小只能在resize过程中计算,这个resize方法中根据阈值来计算的,这样一来,tableSizeFor方法只会计算出比当前cap的下一个2的幂。此处最好还介绍下tableSizeFor的过程。
  • 3.HashMap为什么树化的阈值是8 这也是在面试过程中容易出现的问题,树化的阈值,根据注释中的描述可以知道,在一个比较离散的哈希函数中,哈希冲突的概率服从泊松分布,根据泊松分布的公式,再结合当前HashMap的特点,其计算公式:
(exp(-0.5)* pow(0.5,k)/ factorial(k)

当为8的时候概率已经小于千万分之一。所以通常认为再8的时候转换为树比较合适。

  • 4.HashMap中采用了哪些位运算操作,分别又什么用 这一点参考第二部分,都进行了总结,主要有,一、位移扩容,左移直接扩大两倍。二、根据(hashcode&(oldsize-1))计算bucket的索引。三、扩容的时候根据split拆分为高低位的计算,(hashcode&oldsize)为0则在低位,不为0则在高位,高位的位置为index+oldsize。四、还有一个地方即hash方法中,采用高低位混淆。hashcode^(hashcode>>>16) 。 详细见上文的各个章节。
  • 5.HashMap树化的条件 HashMap并不会在链表长度大于8的时候就变成红黑树,此外还有两个条件,要么size大于64,要么触发扩容。详情参见前文。

以上就是对HashMap源码进行阅读得到的一些结论。HashMap的源码需要反复阅读。

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2020-08-20 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 文章目录
  • 1.类的结构及重要属性
    • 1.1 类的基本结构
      • 1.2 成员变量及常量
        • 1.2.1 常量
        • 1.2.2 成员变量
    • 1.3 重要的内部类
      • 1.3.1 Node
        • 1.3.2 TreeNode
          • 1.3.3 视图型内部类KeySet、Values、EntrySet
            • 1.3.3.1 KeySet
            • 1.3.3.2 Values
            • 1.3.3.2 EntrySet
          • 1.3.4 迭代器HashIterator及并行迭代器HashMapSpliterator
          • 2.基本原理
            • 2.1 HashMap的基本结构
              • 2.2 HashMap中的位运算操作
                • 2.2.1 扩容
                • 2.2.2 bucket计算
                • 2.2.3 split
                • 2.2.4 为什么HashMap的DEFAULT_INITIAL_CAPACITY为16
            • 3.构造函数
              • 3.1 HashMap()
                • 3.2 HashMap(int initialCapacity)
                  • 3.3 HashMap(int initialCapacity, float loadFactor)
                    • 3.4 HashMap(Map<? extends K, ? extends V> m)
                    • 4.重要方法
                      • 4.1 tableSizeFor
                        • 4.2 get
                          • 4.3 hash
                            • 4.4 put
                              • 4.5 resize
                                • 4.6 treeifyBin
                                  • 4.7 remove
                                    • 4.8 clear
                                      • 4.8 containsValue
                                      • 5.总结
                                      相关产品与服务
                                      容器服务
                                      腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
                                      领券
                                      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档