ElasticSearch java API - 聚合查询

以球员信息为例,player索引的player type包含5个字段,姓名,年龄,薪水,球队,场上位置。

index的mapping为:

  1. "mappings": {
  2. "quote": {
  3. "properties": {
  4. "adj_close": {
  5. "type": "long"
  6. },
  7. "open": {
  8. "type": "long"
  9. },
  10. "symbol": {
  11. "index": "not_analyzed",
  12. "type": "string"
  13. },
  14. "volume": {
  15. "type": "long"
  16. },
  17. "high": {
  18. "type": "long"
  19. },
  20. "low": {
  21. "type": "long"
  22. },
  23. "date": {
  24. "format": "strict_date_optional_time||epoch_millis",
  25. "type": "date"
  26. },
  27. "close": {
  28. "type": "long"
  29. }
  30. },
  31. "_all": {
  32. "enabled": false
  33. }
  34. }
  35. }

索引中的全部数据:

name

age

salary

team

position

james

33

3000

cav

sf

irving

25

2000

cav

pg

curry

29

1000

war

pg

thompson

26

2000

war

sg

green

26

2000

war

pf

garnett

40

1000

tim

pf

towns

21

500

tim

c

lavin

21

300

tim

sg

wigins

20

500

tim

sf

首先,初始化Builder:

SearchRequestBuilder sbuilder = client.prepareSearch("player").setTypes("player");

接下来举例说明各种聚合操作的实现方法,因为在es的api中,多字段上的聚合操作需要用到子聚合(subAggregation),初学者可能找不到方法(网上资料比较少,笔者在这个问题上折腾了两天,最后度了源码才彻底搞清楚T_T),后边会特意说明多字段聚合的实现方法。另外,聚合后的排序也会单独说明。

1. group by/count

例如要计算每个球队的球员数,如果使用SQL语句,应表达如下:

select team, count(*) as player_count from player group by team;

ES的java api:

  1. TermsBuilder teamAgg= AggregationBuilders.terms("player_count ").field("team");
  2. sbuilder.addAggregation(teamAgg);
  3. SearchResponse response = sbuilder.execute().actionGet();

2.group by多个field

例如要计算每个球队每个位置的球员数,如果使用SQL语句,应表达如下:

select team, position, count(*) as pos_count from player group by team, position;

ES的java api:

  1. TermsBuilder teamAgg= AggregationBuilders.terms("player_count ").field("team");
  2. TermsBuilder posAgg= AggregationBuilders.terms("pos_count").field("position");
  3. sbuilder.addAggregation(teamAgg.subAggregation(posAgg));
  4. SearchResponse response = sbuilder.execute().actionGet();

3.max/min/sum/avg

例如要计算每个球队年龄最大/最小/总/平均的球员年龄,如果使用SQL语句,应表达如下:

select team, max(age) as max_age from player group by team;

ES的java api:

  1. TermsBuilder teamAgg= AggregationBuilders.terms("player_count ").field("team");
  2. MaxBuilder ageAgg= AggregationBuilders.max("max_age").field("age");
  3. sbuilder.addAggregation(teamAgg.subAggregation(ageAgg));
  4. SearchResponse response = sbuilder.execute().actionGet();

4.对多个field求max/min/sum/avg

例如要计算每个球队球员的平均年龄,同时又要计算总年薪,如果使用SQL语句,应表达如下:

select team, avg(age)as avg_age, sum(salary) as total_salary from player group by team;

ES的java api:

  1. TermsBuilder teamAgg= AggregationBuilders.terms("team");
  2. AvgBuilder ageAgg= AggregationBuilders.avg("avg_age").field("age");
  3. SumBuilder salaryAgg= AggregationBuilders.avg("total_salary ").field("salary");
  4. sbuilder.addAggregation(teamAgg.subAggregation(ageAgg).subAggregation(salaryAgg));
  5. SearchResponse response = sbuilder.execute().actionGet();

5.聚合后对Aggregation结果排序

例如要计算每个球队总年薪,并按照总年薪倒序排列,如果使用SQL语句,应表达如下:

select team, sum(salary) as total_salary from player group by team order by total_salary desc;

ES的java api:

  1. TermsBuilder teamAgg= AggregationBuilders.terms("team").order(Order.aggregation("total_salary ", false);
  2. SumBuilder salaryAgg= AggregationBuilders.avg("total_salary ").field("salary");
  3. sbuilder.addAggregation(teamAgg.subAggregation(salaryAgg));
  4. SearchResponse response = sbuilder.execute().actionGet();

需要特别注意的是,排序是在TermAggregation处执行的,Order.aggregation函数的第一个参数是aggregation的名字,第二个参数是boolean型,true表示正序,false表示倒序。

6.Aggregation结果条数的问题

默认情况下,search执行后,仅返回10条聚合结果,如果想反悔更多的结果,需要在构建TermsBuilder 时指定size:

TermsBuilder teamAgg= AggregationBuilders.terms("team").size(15);

7.Aggregation结果的解析/输出

得到response后:

  1. <span style="white-space:pre"> </span>Map<String, Aggregation> aggMap = response.getAggregations().asMap();
  2. StringTerms teamAgg= (StringTerms) aggMap.get("keywordAgg");
  3. Iterator<Bucket> teamBucketIt = teamAgg.getBuckets().iterator();
  4. while (teamBucketIt .hasNext()) {
  5. Bucket buck = teamBucketIt .next();
  6. //球队名
  7. String team = buck.getKey();
  8. //记录数
  9. long count = buck.getDocCount();
  10. //得到所有子聚合
  11. Map subaggmap = buck.getAggregations().asMap();
  12. //avg值获取方法
  13. double avg_age= ((InternalAvg) subaggmap.get("avg_age")).getValue();
  14. //sum值获取方法
  15. double total_salary = ((InternalSum) subaggmap.get("total_salary")).getValue();
  16. //...
  17. //max/min以此类推
  18. }

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券

年度创作总结 领取年终奖励