Elasticsearch作为分布式搜索引擎,可支持各种数据类型(结构化/非结构化文本、数值等)的存储和快速查询,具有良好的可扩展性,可以支持不断增长的数据量。Elasticsearch不仅可以进行多种场景的数据查询,还提供了强大的聚合查询功能,可实现各种复杂的数据分析需求。 下面重点介绍ES中常用的聚合查询方法,并以系统中具体的功能实现为例,进行详细说明。
分桶聚合可以将文档按照一定规则划分为多个集合,并统计出各个集合中的文档个数。分桶聚合可以分级使用,每个桶中的文档可以再次进行桶聚合(sub-aggregations)。 分桶聚合包括很多种类型(Adjacency matrix aggregation, Chiildren, composite, Date histogram, Filter,Sampler, Terms等),对应不同的分桶策略。每种类型根据需要,可能定义单个桶、固定数量的多个桶,或统计过程中动态创建桶。
可以基于文档数据,计算各种统计指标,计算数据可以是文档中的已有字段,也可以为脚本的执行结果。包括Avg,Cardinality,Geo-bounds,Max,Rate,Scripted metric,Top hits 等多种类型。 数值的聚合统计是一种特殊的metrics aggregation,输出结果为单个值或多个值。可作为分桶聚合的子级聚合(sub-aggregations),部分分桶聚合支持使用各桶中的统计指标对桶进行排序。但是metrics aggregations下面不能再包含子级聚合操作(sub-aggregations)。
管道聚合根据其他聚合结果,而不是索引中的文档数据进行计算,计算结果会添加到结果树中。包含很多类型,都可以概括为两大类:
通过父级聚合输出结果,计算出新的分桶结果,并加入到现有结果中。
利用同级聚合的输出结果,计算出新的结果,加入到结果中去,输出和输入的并集,作为最终的聚合结果。
下面以业务系统中的具体实现,举例说明一些常见的应用场景,及实现方法。
GET my-index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"createdTime": {
"gte": "2022-11-10 00:00:00",
"lte": "2022-11-13 00:00:00"
}
}
},
{
"term": {
"accountId": 1223445
}
}
]
}
},
"aggs": {
"agg_name1": {
"date_histogram": {
"field": "createdTime",
"interval": "day",
"min_doc_count": 0,
"extended_bounds": {
"min": "2022-11-10",
"max": "2022-11-15"
}
}
}
}
}
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 276,
"max_score": 0,
"hits": []
},
"aggregations": {
"agg_name1": {
"buckets": [
{
"key_as_string": "2022-11-10 00:00:00",
"key": 1668038400000,
"doc_count": 1
},
{
"key_as_string": "2022-11-11 00:00:00",
"key": 1668124800000,
"doc_count": 65
},
{
"key_as_string": "2022-11-12 00:00:00",
"key": 1668211200000,
"doc_count": 48
},
{
"key_as_string": "2022-11-13 00:00:00",
"key": 1668297600000,
"doc_count": 148
},
{
"key_as_string": "2022-11-14 00:00:00",
"key": 1668384000000,
"doc_count": 0
},
{
"key_as_string": "2022-11-15 00:00:00",
"key": 1668470400000,
"doc_count": 0
}
]
}
}
}
GET my_index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"createdTime": {
"gte": "2022-11-15 00:00:00",
"lte": "2022-12-01 00:00:00"
}
}
},
{
"term": {
"accountId": 12345644
}
}
]
}
},
"aggs": {
"aggs_name1": {
"terms": {
"field": "ownerId",
"size": 2
},
"aggs": {
"aggs_sub_name1": {
"terms": {
"field": "leadsTouchTag",
"size": 10
}
}
}
}
}
}
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 468,
"max_score": 0,
"hits": []
},
"aggregations": {
"aggs_name1": {
"doc_count_error_upper_bound": 18,
"sum_other_doc_count": 399,
"buckets": [
{
"key": 24994363,
"doc_count": 35,
"aggs_sub_name1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 0,
"doc_count": 17
},
{
"key": 3,
"doc_count": 14
},
{
"key": 1,
"doc_count": 3
},
{
"key": 2,
"doc_count": 1
}
]
}
},
{
"key": 24427834,
"doc_count": 34,
"aggs_sub_name1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 16
},
{
"key": 0,
"doc_count": 15
},
{
"key": 1,
"doc_count": 3
}
]
}
}
]
}
}
}
GET my_index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"createdTime": {
"gte": "2022-11-01 00:00:00",
"lte": "2022-12-01 00:00:00"
}
}
},
{
"term": {
"accountId": 312353212
}
}
]
}
},
"aggs": {
"agg_name1": {
"terms": {
"field": "ownerId",
"size": 3
},
"aggs": {
"aggs_nested_name1": {
"nested": {
"path": "callStatInfo"
},
"aggs": {
"aggs_sub_name1": {
"stats": {
"field": "callStatInfo.totalCallOutDuration"
}
},
"aggs_sub_name2": {
"avg": {
"field": "callStatInfo.totalCallOutNum"
}
}
}
}
}
}
}
}
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 898,
"max_score": 0,
"hits": []
},
"aggregations": {
"agg_name1": {
"doc_count_error_upper_bound": 26,
"sum_other_doc_count": 728,
"buckets": [
{
"key": 24994363,
"doc_count": 58,
"aggs_nested_name1": {
"doc_count": 32,
"aggs_sub_name1": {
"count": 32,
"min": 0,
"max": 207,
"avg": 38.5,
"sum": 1232
},
"aggs_sub_name2": {
"value": 2.4375
}
}
},
{
"key": 24427834,
"doc_count": 57,
"aggs_nested_name1": {
"doc_count": 28,
"aggs_sub_name1": {
"count": 28,
"min": 0,
"max": 182,
"avg": 56.75,
"sum": 1589
},
"aggs_sub_name2": {
"value": 2.1785714285714284
}
}
},
{
"key": 22858878,
"doc_count": 55,
"aggs_nested_name1": {
"doc_count": 0,
"aggs_sub_name1": {
"count": 0,
"min": null,
"max": null,
"avg": null,
"sum": null
},
"aggs_sub_name2": {
"value": null
}
}
}
]
}
}
}
ES提供了强大的聚合查询功能,可以实现复杂的数据查询统计,且表现出良好的性能。业务系统,如果数据量不是特别大的话,进行数据的实时统计分析,使用ES也是不错的选择。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。