版权声明:欢迎转载,请注明出处,谢谢。 https://blog.csdn.net/boling_cavalry/article/details/89735952
聚合是我们在使用elasticsearch服务时常用的功能,从本篇起,一起通过实战来学习和掌握聚合的有关知识;
通过搜索,我们可找到匹配查询条件的文档集;
通过聚合,我们会得到一个数据的概念,以汽车销售信息为例,以下都是聚合数据:
学习Elasticsearch聚合的第一步就是理解两个概念:桶(Buckets)和指标(Metrics)
桶是指满足特定条件的文档的集合,例如按照汽车颜色分类,如下图,每个颜色都有一个桶,里面放的是所有这个颜色的文档:
指标是对桶内的文档进行统计计算,如统计红色汽车的数量、最低价、最高价、平均售价、总销售额等,这些都是根据桶中的文档的值来计算的;
基本概念有所了解后一起通过实战来学习和掌握聚合的知识;
以下是本次实战的环境信息,请确保您的Elasticsearch可以正常运行:
本次实战用到的数据来自《Elasticsearch权威指南》的示例;
字段 | 类型 | 作用 |
---|---|---|
price | long | 汽车售价 |
color | text | 汽车颜色 |
make | text | 汽车品牌 |
sold | date | 销售日期 |
PUT /cars
{
"mappings" : {
"transactions" : {
"properties" : {
"color" : {
"type" : "keyword"
},
"make" : {
"type" : "keyword"
},
"price" : {
"type" : "long"
},
"sold" : {
"type" : "date"
}
}
}
}
}
POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
通过head插件看到新建的索引cars的所有数据如下图,例如第一条记录,表示售价30000,汽车颜色是绿色,品牌是ford,销售时间是2014年5月8日:
第一个聚合命令是terms桶,相当于SQL中的group by,将所有记录按照颜色聚合,执行以下查询命令:
GET /cars/transactions/_search
{
"size":0,
"aggs":{
"popular_colors":{
"terms": {
"field": "color"
}
}
}
}
收到响应如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 8,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"popular_colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "red",
"doc_count" : 4
},
{
"key" : "blue",
"doc_count" : 2
},
{
"key" : "green",
"doc_count" : 2
}
]
}
}
}
现在对查询命令中的参数做出解释:
aggregations" : {
"popular_colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "red",
"doc_count" : 4
},
{
"key" : "blue",
"doc_count" : 2
},
...
GET /cars/transactions/_search
{
"size":0,
"aggs":{
"colors":{
"terms": {
"field": "color"
},
"aggs":{
"sales":{
"sum":{
"field":"price"
}
}
}
}
}
}
收到响应如下:
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 8,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "red",
"doc_count" : 4,
"sales" : {
"value" : 130000.0
}
},
{
"key" : "blue",
"doc_count" : 2,
"sales" : {
"value" : 40000.0
}
},
{
"key" : "green",
"doc_count" : 2,
"sales" : {
"value" : 42000.0
}
}
]
}
}
}
GET /cars/transactions/_search
{
"size":0,
"aggs":{ ------和前面一样,指定聚合操作
"colors":{ ------别名
"terms": { ------桶类型是按指定字段聚合
"field": "color" ------按照color字段聚合
},
"aggs":{ ------新增的aggs对象,用于处理聚合在每个桶内的文档
"sales":{ ------别名
"sum":{ ------度量指标是指定字段求和
"field":"price" ---求和的字段是price
}
}
}
}
}
}
"aggregations" : { ------聚合结果
"colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ------这个json数组的每个对象代表一个桶
{
"key" : "red", ------该桶将所有color等于red的文档聚合进来
"doc_count" : 4, ------有4个color等于red的文档
"sales" : { ------这里面是sum计算后的结果
"value" : 130000.0 ------所有color等于red的汽车销售总额
}
},
{
"key" : "blue",
"doc_count" : 2,
"sales" : {
"value" : 40000.0 ------所有color等于blue的汽车销售总额
}
},
至此,Elasticsearch6的基本聚合操作就完成了,接下来的文章我们会接触到更复杂的聚合操作;