jdk源码分析之Collections--二分查找优化

叔牙

发布于 2020-11-19 15:02:38

3990

发布于 2020-11-19 15:02:38

文章被收录于专栏：一个执拗的后端搬砖工

我们基本上都使用过结合工具类Collections，其重要功能和作用是对结合类的一些操作，比如，查找集合中指定元素，集合排序以及集合类型排序等等。此篇我们将对Collections源码中的二分查找做详细分析以及提出优化方案。

一、Collections二分查找源码及原理分析

首先，先看一下Collections二分查找的源码：

/**

* Searches the specified list for the specified object using the binary

* search algorithm. The list must be sorted into ascending order

* according to the {@linkplain Comparable natural ordering} of its

* elements (as by the {@link #sort(List)} method) prior to making this

* call. If it is not sorted, the results are undefined. If the list

* contains multiple elements equal to the specified object, there is no

* guarantee which one will be found.

*

* This method runs in log(n) time for a "random access" list (which

* provides near-constant-time positional access). If the specified list

* does not implement the {@link RandomAccess} interface and is large,

* this method will do an iterator-based binary search that performs

* O(n) link traversals and O(log n) element comparisons.

*

* @param <T> the class of the objects in the list

* @param list the list to be searched.

* @param key the key to be searched for.

* @return the index of the search key, if it is contained in the list;

* otherwise, <tt>(-(insertion point) - 1)</tt>. The

* insertion point is defined as the point at which the

* key would be inserted into the list: the index of the first

* element greater than the key, or <tt>list.size()</tt> if all

* elements in the list are less than the specified key. Note

* that this guarantees that the return value will be >= 0 if

* and only if the key is found.

* @throws ClassCastException if the list contains elements that are not

* mutually comparable (for example, strings and

* integers), or the search key is not mutually comparable

* with the elements of the list.

*/

public static <T>

int binarySearch(List<? extends Comparable<? super T>> list, T key) {

if (list instanceof RandomAccess || list.size()<BINARYSEARCH_THRESHOLD)

return Collections.indexedBinarySearch(list, key);

else

return Collections.iteratorBinarySearch(list, key);

}

方法的作用和用法简单翻译描述一下，该防范有两个入参，list是待查询的列表，key是要查询的元素，方法的作用是查询出key在list中的位置并返回。方法使用的限定条件是，list必须是按照元素T的排序方法有序的，否则查询结果不准确，还有一点主要注意的是，如果list中包含多个相同元素，程序将不确定哪个会被找到(而不是我们常理解的默认第一个被找到)。如果找到就返回元素下表位置，如果没有找到则返回-1。

该方法根据入参列表的类型和长度分别采用两种不同的实现，如果列表实现了RandomAccess接口(可以简单理解为ArrayList)或者列表长度小于5000(可以理解为LinkedList类型长度不是很大)，将使用Collections.indexedBinarySearch

方法查找，否则(可以理解为LinkedList实现且长度大于等于5000)使用Collections.iteratorBinarySearch查找。

首先先看一下indexedBinarySearch实现：

private static <T>

int indexedBinarySearch(List<? extends Comparable<? super T>> list, T key) {

int low = 0;

int high = list.size()-1;

while (low <= high) {

int mid = (low + high) >>> 1;

Comparable<? super T> midVal = list.get(mid);

int cmp = midVal.compareTo(key);

if (cmp < 0)

low = mid + 1;

else if (cmp > 0)

high = mid - 1;

else

return mid; // key found

}

return -(low + 1); // key not found

}

这是二分查找的java典型实现，我们先用一张图描述其原理

其核心点是三个下标的移动，首先找到列表的头部下标，尾部下标以及中间下标，每次拿中间下标元素值与key相比如果相等直接返回，如果小于中间元素则mid和hign整体向low移动，如果大于中间元素mid和low整体向hign移动，直到找到元素返回下标或者没有找到返回最近的元素下标加1。

再看一下iteratorBinarySearch实现：

private static <T>

int iteratorBinarySearch(List<? extends Comparable<? super T>> list, T key)

{

int low = 0;

int high = list.size()-1;

ListIterator<? extends Comparable<? super T>> i = list.listIterator();

while (low <= high) {

int mid = (low + high) >>> 1;

Comparable<? super T> midVal = get(i, mid);

int cmp = midVal.compareTo(key);

if (cmp < 0)

low = mid + 1;

else if (cmp > 0)

high = mid - 1;

else

return mid; // key found

}

return -(low + 1); // key not found

}

其实该方法的实现逻辑和indexedBinarySearch类似，不同之处是获取元素的方式，前者获取元素直接调用list.get(mid)，而入参的类型决定了其不太适合使用这种方式(区别可以参见ArrayList和LinkedList遍历元素性能对比)，而此处调用了一个内部封装的get(i, mid)方法，看一下实现：

/**

* Gets the ith element from the given list by repositioning the specified

* list listIterator.

*/

private static <T> T get(ListIterator<? extends T> i, int index) {

T obj = null;

int pos = i.nextIndex();

if (pos <= index) {

do {

obj = i.next();

} while (pos++ < index);

} else {

do {

obj = i.previous();

} while (--pos > index);

}

return obj;

}

入参是迭代器和索引位置，既然是使用迭代器访问，那就注定了不能使用类似RandomAccess支持的随机访问的方式，此方法的原理是，使用迭代器遍历元素，从元素上获取索引值，然后索引值与入参索引位置比较，如果找到就返回元素，否则返回null(由于入参index是在迭代器索引值范围内，不会返回null)。此处也再次说明了一个问题，ArrayList比LinkedList随机查询性能好。

二、Collections二分查找优化点分析与实现

上述分析了Collections中的二分查找原理与实现，我们来举例思考一个问题，如果给定一个列表有从1到10元素，那么如果查询-1或者11会如何？

Collections.indexedBinarySearch代码，查询-1步骤如下：

Collections.indexedBinarySearch中要执行三次while循环才结束调用。同样，我们查询11步骤如下：

这次执行四次才结束调用，到这里我们思考一下，既然列表已经是排好序的，那么在查询一个元素的时候，如果目标元素比列表中最小的值小或者比最大的值大，是不是可以直接宣告不存在，而不是再去列表中使用二分方式再查询一遍，就算二分查询性能很好，那么查与不查相比，毫无疑问是不查的性能好，就像0和无穷接近0的正数一样道理。

那么接下来我们对Collections做一下扩展实现，由于Collections构造器是私有，我们无法继承，所以依照Collections的二分查找方法，拷贝一份在上边改造。

binarySearch：

public static <T>

int binarySearch(List<? extends Comparable<? super T>> list, T key) {

if (list instanceof RandomAccess || list.size()<BINARYSEARCH_THRESHOLD)

return CustomCollections.indexedBinarySearch(list, key);

else

return CustomCollections.iteratorBinarySearch(list, key);

}

indexedBinarySearch:

private static <T>

int indexedBinarySearch(List<? extends Comparable<? super T>> list, T key)

{

int low = 0;

int high = list.size()-1;

if(list.get(low).compareTo(key) > 0) {//key小于最小值

return -1;

}

if(list.get(high).compareTo(key) < 0) {//或大于最大值

return -(high + 1);

}

while (low <= high) {

int mid = (low + high) >>> 1;

Comparable<? super T> midVal = list.get(mid);

int cmp = midVal.compareTo(key);

if (cmp < 0)

low = mid + 1;

else if (cmp > 0)

high = mid - 1;

else

return mid; // key found

}

return -(low + 1); // key not found

}

iteratorBinarySearch:

private static <T>

int iteratorBinarySearch(List<? extends Comparable<? super T>> list, T key)

{

int low = 0;

int high = list.size()-1;

ListIterator<? extends Comparable<? super T>> i = list.listIterator();

if(get(i, low).compareTo(key) > 0) {

return -1;

}

if(get(i,high).compareTo(key) < 0) {

return -list.size();

}

while (low <= high) {

int mid = (low + high) >>> 1;

Comparable<? super T> midVal = get(i, mid);

int cmp = midVal.compareTo(key);

if (cmp < 0)

low = mid + 1;

else if (cmp > 0)

high = mid - 1;

else

return mid; // key found

}

return -(low + 1); // key not found

}

对比Collections，二分查找的具体实现加了这样一段代码：

if(list.get(low).compareTo(key) > 0) {//key小于最小值

return -1;

}

if(list.get(high).compareTo(key) < 0) {//或大于最大值

return -(high + 1);

}

浅显易懂，假如目标元素key比list中最小元素还小直接返回-1，没必要走二分逻辑；假如目标元素key比list中最大元素还大，直接返回-(hign+1),同样没必要走二分逻辑。

三、测试验证

接下来，我们使用代码验证猜想，测试代码：

jdk自带Collections二分查找，遍历第一次low=0，mid=4，hign=9：

遍历第二次low=0，mid=1，hign=3：

遍历第三次low=0，mid=0，hign=0：

遍历第四次，不满足low <= high，程序结束。

改造后的CustomCollections二分查找：

可以看出，low=0，hign=9，逻辑走进了我们加的一层校验，命中目标元素小于list列表中最小的元素，这时候直接返回-1，查询结束，不再进入二分查询逻辑。

总结

经过上述一系列分析与测试，我们证实了Collections二分查询逻辑存在优化点，并且用代码验证了我们的猜想，在列表很大的场景下，使用本篇优化过的CustomCollections二分查找必定会带来不小的性能提升，希望给大家在日常开发中带来帮助！

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2018-08-11，如有侵权请联系 cloudcommunity@tencent.com 删除

c++

java

http

本文分享自 PersistentCoder 微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

c++

java

http

登录后参与评论

0 条评论

热度

jdk源码分析之Collections--二分查找优化

jdk源码分析之Collections--二分查找优化

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐