文章/答案/技术大牛

发布

社区首页 >问答首页 >Elasticsearch -如何使用关键字字段订购桶

问Elasticsearch -如何使用关键字字段订购桶
EN

Stack Overflow用户

提问于 2021-02-02 00:31:21

回答 1查看 301关注 0票数 1

我遇到了一个问题，因为我需要使用关键字字段对桶进行排序，为此，我尝试了两种方法。

我一直在尝试从最成功的聚合中排序我的聚合结果(桶)。我的top_hits包含一个元素，即用户名

"user_data": {
          "top_hits": {
            "_source": {
              "includes": ["username"]
            },
            "size": 1
          }
        },

为了对桶进行排序，我尝试用水桶排序，桶排序是这样的

sorting": {
          "bucket_sort": {
          "sort": [
              {
                "user_data>username": {    ----> This is the error 
                "order": "desc"
              }
            }
            ],
            "from": 0,
            "size": 25
          }
        }

但是我收到了语法错误，基本上桶路径是错误的。

我用来完成排序的另一种方法是在用户名上添加另一个聚合以获得最大值。就像这样

"to_sort" : {
          "max": {
            "field": "username"
          }
        }

并使用以下bucket_sort

"sorting": {
          "bucket_sort": {
          "sort": [
              {
                "to_sort": {    
                "order": "desc"
              }
            }
            ],
            "from": 0,
            "size": 25
          }
        }

但基本上我不能使用关键字字段来使用最大聚合。是否有一种方法可以使用用户名对我的桶进行排序，用户名是关键字字段？

我的聚合的父级是

"aggs": {
    "CountryId": {
      "terms": {
        "field": "countryId",
        "size": 10000
      }

用户名的值在每个桶之间是不同的

水桶的结果是这样的

"buckets" : [
        {
          "key" : "11111",
          "doc_count" : 17,
          "user_data" : {
            "hits" : {
              "total" : 10,
              "max_score" : 11,
              "hits" : [
                {
                  "_index" : "index_name",
                  "_type" : "index_name",
                  "_id" : "101010",
                  "_score" : 0.0,
                  "_source" : {
                    "username" : "cccccc"
                  }
                }
              ]
            }
          }
        },
        {
          "key" : "33333",
          "doc_count" : 17,
          "user_data" : {
            "hits" : {
              "total" : 10,
              "max_score" : 11,
              "hits" : [
                {
                  "_index" : "index_name",
                  "_type" : "index_name",
                  "_id" : "101010",
                  "_score" : 0.0,
                  "_source" : {
                    "username" : "bbbbb"
                  }
                }
              ]
            }
          }
        },
{
          "key" : "22222",
          "doc_count" : 17,
          "user_data" : {
            "hits" : {
              "total" : 10,
              "max_score" : 11,
              "hits" : [
                {
                  "_index" : "index_name",
                  "_type" : "index_name",
                  "_id" : "101010",
                  "_score" : 0.0,
                  "_source" : {
                    "username" : "aaaaa"
                  }
                }
              ]
            }
          }
        }
]

和下面的桶结果是，我想要

"buckets" : [
        {
          "key" : "22222",
          "doc_count" : 17,
          "user_data" : {
            "hits" : {
              "total" : 10,
              "max_score" : 11,
              "hits" : [
                {
                  "_index" : "index_name",
                  "_type" : "index_name",
                  "_id" : "101010",
                  "_score" : 0.0,
                  "_source" : {
                    "username" : "aaaaa"
                  }
                }
              ]
            }
          }
        },
        {
          "key" : "33333",
          "doc_count" : 17,
          "user_data" : {
            "hits" : {
              "total" : 10,
              "max_score" : 11,
              "hits" : [
                {
                  "_index" : "index_name",
                  "_type" : "index_name",
                  "_id" : "101010",
                  "_score" : 0.0,
                  "_source" : {
                    "username" : "bbbbb"
                  }
                }
              ]
            }
          }
        },
{
          "key" : "11111",
          "doc_count" : 17,
          "user_data" : {
            "hits" : {
              "total" : 10,
              "max_score" : 11,
              "hits" : [
                {
                  "_index" : "index_name",
                  "_type" : "index_name",
                  "_id" : "101010",
                  "_score" : 0.0,
                  "_source" : {
                    "username" : "ccccc"
                  }
                }
              ]
            }
          }
        }
]

如何看到这些桶是按用户名.订购的。

elasticsearch

kibana

elasticsearch-aggregation

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-02-13 10:13:36

我有一个类似的问题，没有在互联网上找到任何答案。所以我试着建造自己的房子，花了我差不多一周的时间。由于对字符串的有序哈希代码生成的限制，它不会总是工作，所以您必须使用自己的charset和字符串上的第一个字符的长度来进行排序(对于我来说是6个)，进行一些测试，因为您只想使用long类型的正间隔，否则它根本不能工作(因为我的字符集长度可能高达13)。基本上，我使用一个基于手动从bucket_sort中查找top_hits的scripted_metric来为这里构建度量，并对其进行了调整，以计算一个有序的scripted_metric关键字哈希码。下面是我的查询，在这里，我按sso.name关键字对用户的上一次访问次数进行排序，您应该可以或多或少地根据您的问题来调整它。

{
  "size": 0,
  "timeout": "60s",
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "user_id"
          }
        }
      ]
    }
  },
  "aggregations": {
    "by_user": {
      "terms": {
        "field": "user_id",
        "size": 10000,
        "order": [
          {
            "_count": "desc"
          },
          {
            "_key": "asc"
          }
        ]
      },
      "aggregations": {
        "my_top_hits_sso_ordered_hash": {
          "scripted_metric": {
            "init_script": "state.timestamp_latest = 0L; state.last_sso_ordered_hash = 0L",
            "map_script": """ 
              def current_date = doc['login_timestamp'].getValue().toInstant().toEpochMilli();
              if (current_date > state.timestamp_latest) {
                state.timestamp_latest = current_date;
                state.last_sso_ordered_hash = 0L;
                if(doc['sso.name'].size()>0) {
                  String charset = "abcdefghijklmnopqrstuvwxyz";
                  String ssoName = doc['sso.name'].value;
                  int length = charset.length(); 
                  for(int i = 0; i<Math.min(ssoName.length(), 6); i++) {
                    state.last_sso_ordered_hash = state.last_sso_ordered_hash*length + charset.indexOf(String.valueOf(ssoName.charAt(i))) + 1;
                  }
                }
              }
            """,
            "combine_script":"return state",
            "reduce_script": """ 
              def last_sso_ordered_hash = '';
              def timestamp_latest = 0L;
              for (s in states) {
                if (s.timestamp_latest > (timestamp_latest)) {
                  timestamp_latest = s.timestamp_latest; last_sso_ordered_hash = s.last_sso_ordered_hash;
                }
              }
              return last_sso_ordered_hash;
            """
          }
        },
        "user_last_session": {
          "top_hits": {
            "from": 0,
            "size": 1,
            "sort": [
              {
                "login_timestamp": {
                  "order": "desc"
                }
              }
            ]
          }
        },
        "pagination": {
          "bucket_sort": {
            "sort": [
              {
                "my_top_hits_sso_ordered_hash.value": {
                  "order": "desc"
                }
              }
            ],
            "from": 0,
            "size": 100
          }
        }
      }
    }
  }
}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66002081

复制

相似问题

问Elasticsearch -如何使用关键字字段订购桶
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Elasticsearch -如何使用关键字字段订购桶EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Elasticsearch -如何使用关键字字段订购桶
EN