HashMap
has two important properties: size
and load factor
. I went through the Java documentation and it says 0.75f
is the initial load factor. But I can't find the actual use of it.
Can someone describe what are the different scenarios where we need to set load factor and what are some sample ideal values for different cases?
答案:
An instance of HashMap has two parameters that affect its performance: initial capacity and load
factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply
the capacity at the time the hash table is created. The load factor is a measure of how full the
hash table is allowed to get before its capacity is automatically increased. When the number of
entries in the hash table exceeds the product of the load factor and the current capacity, the hash
table is rehashed (that is, internal data structures are rebuilt) so that the hash table has
approximately twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff between time and space
costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of
the operations of the HashMap class, including get and put). The expected number of entries in the
map and its load factor should be taken into account when setting its initial capacity, so as to
minimize the number of rehash operations. If the initial capacity is greater than the
maximum number of entries divided by the load factor, no rehash operations
will ever occur.
As with all performance optimizations, it is a good idea to avoid optimizing
things prematurely (i.e. without hard data on where the bottlenecks are).
百度翻译:
hashmap的一个实例有两个影响其性能的参数:初始容量和负载系数。容量是哈希表中的存储桶数,初始容量只是创建哈希表时的容量。加载因子是一个度量散列表在自动增加其容量之前被允许达到的完整性的度量。当散列表中的条目数超过加载因子和当前容量的乘积时,散列表将被重新刷新(即重建内部数据结构),以便散列表具有大约两倍的存储桶数。
作为一般规则,默认的负载系数(.75)在时间和空间成本之间提供了一个很好的权衡。更高的值减少了空间开销,但增加了查找成本(反映在hashmap类的大多数操作中,包括get和put)。在设置初始容量时,应考虑到地图中预期的条目数量及其负载系数,以尽量减少重新刷新操作的次数。如果初始容量大于最大条目数除以负载系数,则不会发生再刷新操作。
与所有性能优化一样,最好避免过早地进行优化(即,没有关于瓶颈所在位置的硬数据)。
一切为了性能