首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >23个有用的Elasticsearch示例查询

23个有用的Elasticsearch示例查询

作者头像
February
发布2018-12-03 16:03:30
9.4K0
发布2018-12-03 16:03:30
举报
文章被收录于专栏:技术翻译技术翻译技术翻译

为了说明Elasticsearch中的不同查询类型,我们将使用以下字段搜索书籍文档的集合:标题,作者,摘要,发布日期和评论数。

但首先,让我们使用批量API创建一个新索引并索引一些文档:

PUT /bookdb_index
    { "settings": { "number_of_shards": 1 }}
POST /bookdb_index/book/_bulk
    { "index": { "_id": 1 }}
    { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" }
    { "index": { "_id": 2 }}
    { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" }
    { "index": { "_id": 3 }}
    { "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" }
    { "index": { "_id": 4 }}
    { "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary" : "Compre

举例

基本匹配查询

有两种方法可以执行基本的全文(匹配)查询:使用Search Lite API,它希望所有搜索参数作为URL的一部分传入,或者使用完整的JSON请求体,允许您使用完整的Elasticsearch DSL。

这是一个基本匹配查询,用于在所有字段中搜索字符串“guide”:

GET /bookdb_index/book/_search?q=guide
[Results]
"hits": [
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "4",
    "_score": 1.3278645,
    "_source": {
      "title": "Solr in Action",
      "authors": [
        "trey grainger",
        "timothy potter"
      ],
      "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
      "publish_date": "2014-04-05",
      "num_reviews": 23,
      "publisher": "manning"
    }
  },
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "1",
    "_score": 1.2871116,
    "_source": {
      "title": "Elasticsearch: The Definitive Guide",
      "authors": [
        "clinton gormley",
        "zachary tong"
      ],
      "summary": "A distibuted real-time search and analytics engine",
      "publish_date": "2015-02-07",
      "num_reviews": 20,
      "publisher": "oreilly"
    }
  }
]

此查询的精简版如下所示,并产生与上述搜索相同的结果。

{
    "query": {
        "multi_match" : {
            "query" : "guide",
            "fields" : ["title", "authors", "summary", "publish_date", "num_reviews", "publisher"]
        }
    }
}

作为对多个字段运行相同查询的便捷缩写方式,multi_match关键字用于代替match关键字。fields 属性指定要查询的字段,在这种情况下,我们要查询文档中的所有字段。

注意:在ElasticSearch 6之前,您可以使用“ _all”字段在所有字段中查找匹配项,而不必指定每个字段。“ _all”字段的工作原理是将所有字段连接成一个大字段,使用空格作为分隔符,然后分析和索引字段。在ES6中,默认情况下已弃用和禁用此功能。如果您对创建自定义“ _all”字段感兴趣,ES6将提供“copy_to”参数。有关详细信息,请参阅ElasticSearch指南

SearchLite API还允许您指定要搜索的字段。例如,要在标题字段中搜索带有“in Action”字样的图书:

GET /bookdb_index/book/_search?q=title:in action
[Results]
"hits": [
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "3",
    "_score": 1.6323128,
    "_source": {
      "title": "Elasticsearch in Action",
      "authors": [
        "radu gheorge",
        "matthew lee hinman",
        "roy russo"
      ],
      "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
      "publish_date": "2015-12-03",
      "num_reviews": 18,
      "publisher": "manning"
    }
  },
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "4",
    "_score": 1.6323128,
    "_source": {
      "title": "Solr in Action",
      "authors": [
        "trey grainger",
        "timothy potter"
      ],
      "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
      "publish_date": "2014-04-05",
      "num_reviews": 23,
      "publisher": "manning"
    }
  }
]

但是,全身DSL使您可以更灵活地创建更复杂的查询(我们将在后面看到)并指定您希望如何返回结果。在下面的示例中,我们指定了我们想要返回的结果数,从开始的偏移量(对分页有用),我们想要返回的文档字段以及术语突出显示。请注意,我们使用“ match”查询而不是“ multi_match”查询,因为我们只关心在标题字段中进行搜索。

POST /bookdb_index/book/_search
{
    "query": {
        "match" : {
            "title" : "in action"
        }
    },
    "size": 2,
    "from": 0,
    "_source": [ "title", "summary", "publish_date" ],
    "highlight": {
        "fields" : {
            "title" : {}
        }
    }
}
[Results]
"hits": {
  "total": 2,
  "max_score": 1.6323128,
  "hits": [
    {
      "_index": "bookdb_index",
      "_type": "book",
      "_id": "3",
      "_score": 1.6323128,
      "_source": {
        "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
        "title": "Elasticsearch in Action",
        "publish_date": "2015-12-03"
      },
      "highlight": {
        "title": [
          "Elasticsearch <em>in</em> <em>Action</em>"
        ]
      }
    },
    {
      "_index": "bookdb_index",
      "_type": "book",
      "_id": "4",
      "_score": 1.6323128,
      "_source": {
        "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
        "title": "Solr in Action",
        "publish_date": "2014-04-05"
      },
      "highlight": {
        "title": [
          "Solr <em>in</em> <em>Action</em>"
        ]
      }
    }
  ]

注意:对于多字查询,match 查询允许您指定是使用and运算符而不是默认的or运算符。您还可以指定 用于调整返回结果的相关性的minimum_should_match选项。详细信息可以在Elasticsearch指南中找到 。

提高

由于我们正在搜索多个字段,因此我们可能希望提高某个字段的分数。在下面的设计示例中,我们将摘要字段的得分提高了3倍,以增加摘要字段的重要性,这反过来又会增加_id 4文档的相关性。

POST /bookdb_index/book/_search
{
    "query": {
        "multi_match" : {
            "query" : "elasticsearch guide",
            "fields": ["title", "summary^3"]
        }
    },
    "_source": ["title", "summary", "publish_date"]
}
[Results]
"hits": {
  "total": 3,
  "max_score": 3.9835935,
  "hits": [
    {
      "_index": "bookdb_index",
      "_type": "book",
      "_id": "4",
      "_score": 3.9835935,
      "_source": {
        "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
        "title": "Solr in Action",
        "publish_date": "2014-04-05"
      }
    },
    {
      "_index": "bookdb_index",
      "_type": "book",
      "_id": "3",
      "_score": 3.1001682,
      "_source": {
        "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
        "title": "Elasticsearch in Action",
        "publish_date": "2015-12-03"
      }
    },
    {
      "_index": "bookdb_index",
      "_type": "book",
      "_id": "1",
      "_score": 2.0281231,
      "_source": {
        "summary": "A distibuted real-time search and analytics engine",
        "title": "Elasticsearch: The Definitive Guide",
        "publish_date": "2015-02-07"
      }
    }
  ]

注意:提升不仅仅意味着计算得分乘以提升因子。应用的实际增强值通过标准化和一些内部优化。有关增强功能如何工作的更多信息,请参阅Elasticsearch指南

Bool查询

AND / OR / NOT运算符可用于微调我们的搜索查询,以提供更相关或特定的结果。这在搜索API中作为 bool 查询实现。该bool查询接受一个 must 参数(相当于AND),一个 must_not 参数(相当于NOT),以及一个 should 参数(相当于OR)。例如,如果我想在标题中搜索带有“Elasticsearch”或“Solr”字样的书,则由“clinton gormley”创作,但不是由“radu gheorge”创作的:

POST /bookdb_index/book/_search
{
  "query": {
    "bool": {
      "must": {
        "bool" : { 
          "should": [
            { "match": { "title": "Elasticsearch" }},
            { "match": { "title": "Solr" }} 
          ],
          "must": { "match": { "authors": "clinton gormely" }} 
        }
      },
      "must_not": { "match": {"authors": "radu gheorge" }}
    }
  }
}
[Results]
"hits": [
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "1",
    "_score": 2.0749094,
    "_source": {
      "title": "Elasticsearch: The Definitive Guide",
      "authors": [
        "clinton gormley",
        "zachary tong"
      ],
      "summary": "A distibuted real-time search and analytics engine",
      "publish_date": "2015-02-07",
      "num_reviews": 20,
      "publisher": "oreilly"
    }
  }
]

注意:正如您所看到的,bool查询可以包装任何其他查询类型,包括其他bool查询,以创建任意复杂或深度嵌套的查询。

模糊查询

可以在匹配和多匹配查询上启用模糊匹配以捕获拼写错误。基于与原始单词的Levenshtein距离来指定模糊度,即,一个字符的数量需要对一个字符串进行更改以使其与另一个字符串相同。

POST /bookdb_index/book/_search
{
    "query": {
        "multi_match" : {
            "query" : "comprihensiv guide",
            "fields": ["title", "summary"],
            "fuzziness": "AUTO"
        }
    },
    "_source": ["title", "summary", "publish_date"],
    "size": 1
}
[Results]
"hits": [
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "4",
    "_score": 2.4344182,
    "_source": {
      "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
      "title": "Solr in Action",
      "publish_date": "2014-04-05"
    }
  }
]

注意: 您可以指定数字0,1或2,而不是指定“自动”,以指示可以对字符串进行编辑以查找匹配项的最大编辑数。使用“AUTO”的好处是它考虑了字符串的长度。对于长度仅为3个字符的字符串,允许2的模糊性将导致较差的搜索性能。因此,在大多数情况下,建议坚持使用“自动”。

通配符查询

通配符查询允许您指定要匹配的模式而不是整个术语。 ? 匹配任何字符和 * 匹配零个或多个字符。例如,要查找具有名称以字母“t”开头的作者的所有记录:

POST /bookdb_index/book/_search
{
    "query": {
        "wildcard" : {
            "authors" : "t*"
        }
    },
    "_source": ["title", "authors"],
    "highlight": {
        "fields" : {
            "authors" : {}
        }
    }
}
[Results]
"hits": [
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "1",
    "_score": 1,
    "_source": {
      "title": "Elasticsearch: The Definitive Guide",
      "authors": [
        "clinton gormley",
        "zachary tong"
      ]
    },
    "highlight": {
      "authors": [
        "zachary <em>tong</em>"
      ]
    }
  },
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "2",
    "_score": 1,
    "_source": {
      "title": "Taming Text: How to Find, Organize, and Manipulate It",
      "authors": [
        "grant ingersoll",
        "thomas morton",
        "drew farris"
      ]
    },
    "highlight": {
      "authors": [
        "<em>thomas</em> morton"
      ]
    }
  },
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "4",
    "_score": 1,
    "_source": {
      "title": "Solr in Action",
      "authors": [
        "trey grainger",
        "timothy potter"
      ]
    },
    "highlight": {
      "authors": [
        "<em>trey</em> grainger",
        "<em>timothy</em> potter"
      ]
    }
  }
]

Regexp查询

Regexp查询允许您指定比通配符查询更复杂的模式。

POST /bookdb_index/book/_search
{
    "query": {
        "regexp" : {
            "authors" : "t[a-z]*y"
        }
    },
    "_source": ["title", "authors"],
    "highlight": {
        "fields" : {
            "authors" : {}
        }
    }
}
[Results]
"hits": [
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "4",
    "_score": 1,
    "_source": {
      "title": "Solr in Action",
      "authors": [
        "trey grainger",
        "timothy potter"
      ]
    },
    "highlight": {
      "authors": [
        "<em>trey</em> grainger",
        "<em>timothy</em> potter"
      ]
    }
  }
]

匹配短语查询

匹配短语查询要求查询字符串中的所有术语都存在于文档中,采用查询字符串中指定的顺序并且彼此接近。默认情况下,这些术语必须完全相邻,但您可以指定一个slop 值,该值指示允许的术语相隔多远,同时仍然认为文档匹配。

POST /bookdb_index/book/_search
{
    "query": {
        "multi_match" : {
            "query": "search engine",
            "fields": ["title", "summary"],
            "type": "phrase",
            "slop": 3
        }
    },
    "_source": [ "title", "summary", "publish_date" ]
}
[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.22327082,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.16113183,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]

注意:在上面的示例中,对于非短语类型查询,文档_id 1通常具有较高的分数并显示在文档之前, _id 4 因为其字段长度较短。但是,作为短语查询,术语的接近程度被考虑在内,因此文档 _id 4分数更好。

注意:另请注意,如果slop参数减少为1,则文档_id 1将不再出现在结果集中。

匹配短语前缀

匹配短语前缀查询在查询时提供“搜索”类型或穷人的自动完成版本,无需以任何方式准备数据。与match_phrase 查询一样 ,它接受一个 slop 参数来使单词顺序和相对位置稍微不那么严格。它还接受 max_expansions 参数来限制匹配的术语数量,以减少资源强度。

POST /bookdb_index/book/_search
{
    "query": {
        "match_phrase_prefix" : {
            "summary": {
                "query": "search en",
                "slop": 3,
                "max_expansions": 10
            }
        }
    },
    "_source": [ "title", "summary", "publish_date" ]
}
[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.5161346,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.37248808,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]

注意:查询时搜索类型具有性能成本。更好的解决方案是索引时搜索类型。有关更多信息,请查看Completion Suggester APIEdge-Ngram过滤器的使用。

请求参数

query_string查询提供了一种以简洁的简写语法执行multi_match 查询,bool查询,提升,模糊匹配,通配符,正则表达式和范围查询的方法。在下面的示例中,我们对术语“搜索算法”执行模糊搜索,其中一个书籍作者是“grant ingersoll”或“tom morton”。我们搜索所有字段,但在摘要字段中应用2的提升。

POST /bookdb_index/book/_search
{
    "query": {
        "query_string" : {
            "query": "(saerch~1 algorithm~1) AND (grant ingersoll)  OR (tom morton)",
            "fields": ["title", "authors" , "summary^2"]
        }
    },
    "_source": [ "title", "summary", "authors" ],
    "highlight": {
        "fields" : {
            "summary" : {}
        }
    }
}
[Results]
"hits": [
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "2",
    "_score": 3.571021,
    "_source": {
      "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
      "title": "Taming Text: How to Find, Organize, and Manipulate It",
      "authors": [
        "grant ingersoll",
        "thomas morton",
        "drew farris"
      ]
    },
    "highlight": {
      "summary": [
        "organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging"
      ]
    }
  }
]

简单查询字符串

simple_query_string查询是一个更适合于暴露给用户提供一个单一的搜索框使用的query_string查询版本,因为它取代了使用AND / OR / NOT与+ / | / -分别与它丢弃的无效部分如果用户犯了错误,则查询而不是抛出异常。

POST /bookdb_index/book/_search
{
    "query": {
        "simple_query_string" : {
            "query": "(saerch~1 algorithm~1) + (grant ingersoll)  | (tom morton)",
            "fields": ["title", "authors" , "summary^2"]
        }
    },
    "_source": [ "title", "summary", "authors" ],
    "highlight": {
        "fields" : {
            "summary" : {}
        }
    }
}
​

术语/术语查询

以上示例是全文搜索的示例。有时我们对结构化搜索更感兴趣,我们希望在其中找到完全匹配并返回结果。termterms查询在这里帮助我们。在下面的示例中,我们正在搜索Manning Publications发布的索引中的所有书籍。

POST /bookdb_index/book/_search
{
    "query": {
        "term" : {
            "publisher": "manning"
        }
    },
    "_source" : ["title","publish_date","publisher"]
}
[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 1.2231436,
        "_source": {
          "publisher": "manning",
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1.2231436,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.2231436,
        "_source": {
          "publisher": "manning",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      }
    ]

可以通过使用terms 关键字并传入搜索项数组来指定多个术语。

{
    "query": {
        "terms" : {
            "publisher": ["oreilly", "packt"]
        }
    }
}

术语查询 - 已排序

术语查询结果(与任何其他查询结果一样)可以轻松排序。还允许多级排序。

POST /bookdb_index/book/_search
{
    "query": {
        "term" : {
            "publisher": "manning"
        }
    },
    "_source" : ["title","publish_date","publisher"],
    "sort": [
        { "publish_date": {"order":"desc"}}
    ]
}
[Results]
"hits": [
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "3",
    "_score": null,
    "_source": {
      "publisher": "manning",
      "title": "Elasticsearch in Action",
      "publish_date": "2015-12-03"
    },
    "sort": [
      1449100800000
    ]
  },
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "4",
    "_score": null,
    "_source": {
      "publisher": "manning",
      "title": "Solr in Action",
      "publish_date": "2014-04-05"
    },
    "sort": [
      1396656000000
    ]
  },
  {
    "_index": "bookdb_index",
    "_type": "book",
    "_id": "2",
    "_score": null,
    "_source": {
      "publisher": "manning",
      "title": "Taming Text: How to Find, Organize, and Manipulate It",
      "publish_date": "2013-01-24"
    },
    "sort": [
      1358985600000
    ]
  }
]

注意:在ES6中,要按文本字段(例如标题)进行排序或聚合,您需要在该字段上启用fielddata。有关这方面的更多详细信息,请参阅ElasticSearch指南

范围查询

另一个结构化查询示例是范围查询。在此示例中,我们搜索2015年发布的书籍。

POST /bookdb_index/book/_search
{
    "query": {
        "range" : {
            "publish_date": {
                "gte": "2015-01-01",
                "lte": "2015-12-31"
            }
        }
    },
    "_source" : ["title","publish_date","publisher"]
}
[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1,
        "_source": {
          "publisher": "oreilly",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      }
    ]

注意:范围查询适用于日期,数字和字符串类型字段。

过滤的Bool查询

使用bool查询时,可以使用filter子句来过滤查询结果。对于我们的示例,我们在标题或摘要中查询带有“Elasticsearch”一词的书籍,但我们希望将结果过滤为仅包含20个或更多评论的书籍。

POST /bookdb_index/book/_search
{
    "query": {
        "filtered": {
            "query" : {
                "multi_match": {
                    "query": "elasticsearch",
                    "fields": ["title","summary"]
                }
            },
            "filter": {
                "range" : {
                    "num_reviews": {
                        "gte": 20
                    }
                }
            }
        }
    },
    "_source" : ["title","summary","publisher", "num_reviews"]
}
[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.5955761,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "publisher": "oreilly",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide"
        }
      }
    ]

可以通过使用 bool过滤器组合多个过滤器。在下一个示例中,过滤器确定返回的结果必须至少有20条评论,不得在2015年之前发布,并且应由O'Reilly发布。

POST /bookdb_index/book/_search
{
    "query": {
        "filtered": {
            "query" : {
                "multi_match": {
                    "query": "elasticsearch",
                    "fields": ["title","summary"]
                }
            },
            "filter": {
                "bool": {
                    "must": {
                        "range" : { "num_reviews": { "gte": 20 } }
                    },
                    "must_not": {
                        "range" : { "publish_date": { "lte": "2014-12-31" } }
                    },
                    "should": {
                        "term": { "publisher": "oreilly" }
                    }
                }
            }
        }
    },
    "_source" : ["title","summary","publisher", "num_reviews", "publish_date"]
}
[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.5955761,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "publisher": "oreilly",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]

功能评分:字段值因子

可能存在这样的情况:您希望将文档中特定字段的值计入相关性分数的计算中。在您希望根据文档的受欢迎程度提高文档相关性的情况下,这是典型的。在我们的例子中,我们希望更受欢迎的书籍(根据评论数量判断)得到提升。这可以使用field_value_factor功能分数。

POST /bookdb_index/book/_search
{
    "query": {
        "function_score": {
            "query": {
                "multi_match" : {
                    "query" : "search engine",
                    "fields": ["title", "summary"]
                }
            },
            "field_value_factor": {
                "field" : "num_reviews",
                "modifier": "log1p",
                "factor" : 2
            }
        }
    },
    "_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.44831306,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.3718407,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "num_reviews": 23,
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.046479136,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "num_reviews": 18,
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.041432835,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "num_reviews": 12,
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      }
    ]

注1:我们可以只运行常规multi_match查询并按num_reviews字段排序,但之后我们失去了相关性评分的好处。

注2:有许多附加参数可以调整增强效果对原始相关性得分的影响程度,例如“修饰符”,“因子”,“boost_mode”等。这些参数在Elasticsearch指南中详细介绍。

功能评分:衰变函数

假设您没有想要通过字段的值逐步提升,而是想要定位一个理想的值,并且您希望提升因子在离开该值时离得更远。这通常适用于基于纬度/经度,数字字段(如价格或日期)的提升。在我们设计的例子中,我们正在搜索2014年6月左右理想发布的“搜索引擎”书籍。

POST /bookdb_index/book/_search
{
    "query": {
        "function_score": {
            "query": {
                "multi_match" : {
                    "query" : "search engine",
                    "fields": ["title", "summary"]
                }
            },
            "functions": [
                {
                    "exp": {
                        "publish_date" : {
                            "origin": "2014-06-15",
                            "offset": "7d",
                            "scale" : "30d"
                        }
                    }
                }
            ],
            "boost_mode" : "replace"
        }
    },
    "_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.27420625,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "num_reviews": 23,
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.005920768,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.000011564,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "num_reviews": 12,
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.0000059171475,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "num_reviews": 18,
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      }
    ]

功能评分:脚本评分

在内置评分函数不能满足您的需求的情况下,可以选择指定用于评分的Groovy脚本。在我们的示例中,我们要指定一个脚本,该脚本在决定评估数量的因素之前会考虑到publish_date 。较新的书籍可能没有那么多的评论,所以他们不应该因此而处于不利地位。

评分脚本如下所示:

publish_date = doc['publish_date'].value
num_reviews = doc['num_reviews'].value
if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) {
  my_score = Math.log(2.5 + num_reviews)
} else {
  my_score = Math.log(1 + num_reviews)
}
return my_score

要动态使用评分脚本,我们使用script_score参数:

POST /bookdb_index/book/_search
{
    "query": {
        "function_score": {
            "query": {
                "multi_match" : {
                    "query" : "search engine",
                    "fields": ["title", "summary"]
                }
            },
            "functions": [
                {
                    "script_score": {
                        "params" : {
                            "threshold": "2015-07-30"
                        },
                        "script": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 + num_reviews);"
                    }
                }
            ]
        }
    },
    "_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": {
    "total": 4,
    "max_score": 0.8463001,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.8463001,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.7067348,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "num_reviews": 23,
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.08952084,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "num_reviews": 18,
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.07602123,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "num_reviews": 12,
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      }
    ]
  }

注意1:要使用动态脚本,必须在config/elasticsearch.yaml 文件中为Elasticsearch实例启用它 。也可以使用已存储在Elasticsearch服务器上的脚本。有关更多信息,请查看Elasticsearch参考文档

注意2: JSON不能包含嵌入的换行符,因此分号用于分隔语句。

原文标题《23 Useful Elasticsearch Example Queries》

作者:Tim Ojo

译者:February

不代表云加社区观点,更多详情请查看原文链接

本文系外文翻译,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文系外文翻译前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 举例
    • 基本匹配查询
      • 提高
        • Bool查询
          • 模糊查询
            • 通配符查询
              • Regexp查询
                • 匹配短语查询
                  • 匹配短语前缀
                    • 请求参数
                      • 简单查询字符串
                        • 术语/术语查询
                          • 术语查询 - 已排序
                            • 范围查询
                              • 过滤的Bool查询
                                • 功能评分:字段值因子
                                  • 功能评分:衰变函数
                                    • 功能评分:脚本评分
                                    相关产品与服务
                                    Elasticsearch Service
                                    腾讯云 Elasticsearch Service(ES)是云端全托管海量数据检索分析服务,拥有高性能自研内核,集成X-Pack。ES 支持通过自治索引、存算分离、集群巡检等特性轻松管理集群,也支持免运维、自动弹性、按需使用的 Serverless 模式。使用 ES 您可以高效构建信息检索、日志分析、运维监控等服务,它独特的向量检索还可助您构建基于语义、图像的AI深度应用。
                                    领券
                                    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档