假设我有这个给定的数据
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "GEORGE",
"favorite_cars" : [ "honda","Hyundae" ]
}
每当我在搜索最喜欢丰田汽车的人时查询此数据时,它都会返回此数据
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}
结果是两个名为ABC的记录。如何仅选择不同的文档?我想要得到的结果是这样的
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}
下面是我的问题
{
"fuzzy_like_this_field" : {
"favorite_cars" : {
"like_text" : "toyota",
"max_query_terms" : 12
}
}
}
我使用的是ElasticSearch 1.0.0。使用java api客户端
发布于 2014-07-14 06:03:29
您可以使用aggregations消除重复项。使用term aggregation,结果将按一个字段分组,例如name
,还提供该字段每个值出现的计数,并将按此计数对结果进行排序(降序)。
{
"query": {
"fuzzy_like_this_field": {
"favorite_cars": {
"like_text": "toyota",
"max_query_terms": 12
}
}
},
"aggs": {
"grouped_by_name": {
"terms": {
"field": "name",
"size": 0
}
}
}
}
除了hits
之外,结果还将包含具有key
中的唯一值和doc_count
中的计数的buckets
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.19178301,
"hits" : [ {
"_index" : "pru",
"_type" : "pru",
"_id" : "vGkoVV5cR8SN3lvbWzLaFQ",
"_score" : 0.19178301,
"_source":{"name":"ABC","favorite_cars":["ferrari","toyota"]}
}, {
"_index" : "pru",
"_type" : "pru",
"_id" : "IdEbAcI6TM6oCVxCI_3fug",
"_score" : 0.19178301,
"_source":{"name":"ABC","favorite_cars":["ferrari","toyota"]}
} ]
},
"aggregations" : {
"grouped_by_name" : {
"buckets" : [ {
"key" : "abc",
"doc_count" : 2
} ]
}
}
}
请注意,由于重复消除和结果排序,使用聚合的成本会很高。
发布于 2015-04-09 02:56:25
@JRL几乎是正确的。您的查询中将需要一个聚合。这将使您获得对象中按出现次数排序的前10000个"favorite_cars“的列表。
{
"query":{ "match_all":{ } },
"size":0,
"Distinct" : {
"Cars" : {
"terms" : { "field" : "favorite_cars", "order": { "_count": "desc"}, "size":10000 }
}
}
}
同样值得注意的是,为了得到"McLaren F1“而不是"McLaren”、"F1“,您将不希望您的"favorite_car”字段被分析。
"favorite_car": {
"type": "string",
"index": "not_analyzed"
}
https://stackoverflow.com/questions/24508191
复制相似问题