提到elasticsearch分页,可能首先想到的是类似mysql的那种处理方式,传入分页起始值以及每页数据量,es确实提供了类似的处理策略,代码如下:
@Test
public void searchFromSize() throws IOException{
SearchRequest searchRequest = new SearchRequest("sub_bank1120");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
//每页10个数据
searchSourceBuilder.size(10);
//起始位置从10开始
searchSourceBuilder.from(10);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = highLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] searchHits = searchResponse.getHits().getHits();
for(SearchHit s:searchHits){
println(s.getSourceAsString());
}
}
但是上述方式有一个严重的缺陷:from和size不能太大,两者之和不能超过index.max_result_window,超过该值就会报
org.elasticsearch.client.ResponseException异常
Result window is too large, from + size must be less than or equal to: [10000] but was [11010]
为什么会使用index.max_result_window来限制搜索深度,因为这需要耗费大量内存,比如from为10000,es会按照一定的顺序从每个分片读取10010个数据,然后取出每个分片中排序前10的数据返回给协调节点,协调节点会将从所有分片节点返回的10条数据再次进行统一排序处理,以此来返回全局排序前10的数据,如果有类似的需要可以使用scroll以及search after来实现超大分页问题,
scroll分页示例代码可以参考:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/6.8/java-rest-high-search-scroll.html
search after示例可以参考下面代码:
/**
* search after
* @throws IOException
*/
@Test
public void searchAfter() throws IOException{
SearchRequest searchRequest = new SearchRequest("sub_bank1031");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchQuery("cityId", "511000"));
searchSourceBuilder.size(2);
//id动态映射为text类型,排序不能使用分词的字段,所以这里选择了id的keyword多字段属性
searchSourceBuilder.sort(new FieldSortBuilder("id.keyword").order(SortOrder.ASC));
//
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = highLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] searchHits = searchResponse.getHits().getHits();
if(searchHits.length >0){
for(SearchHit s:searchHits){
println(s.getSourceAsString());
}
JSONObject json = JSON.parseObject(searchHits[searchHits.length-1].getSourceAsString());
String id = json.getString("id");
searchSourceBuilder.searchAfter(new Object[]{id});
searchRequest.source(searchSourceBuilder);
searchResponse = highLevelClient.search(searchRequest, RequestOptions.DEFAULT);
for(SearchHit s:searchHits){
println(s.getSourceAsString());
}
}
}
下图为索引映射部分截图:
PS:
search after与scroll相比简单些,而且无状态。