开始使用Elasticsearch （2）

腾讯云大数据

修改于 2021-01-08 16:30:33

1.3K0

文章被收录于专栏：腾讯云Elasticsearch Service腾讯云Elasticsearch Service

【腾讯云 Elasticsearch Service 】高可用，可伸缩，云端全托管。集成 X-Pack 高级特性，适用日志分析/企业搜索/BI分析等场景

在上一篇文章中，我们已经介绍了如何使用 REST 接口来在 Elasticsearch 中创建 index ，文档以及对它们的操作。在今天的文章里，我们来介绍如何利用 Elasticsearch 来搜索我们的数据。Elasticsearch 是近实时的搜索。我们还是接着我们上次的练习“开始使用 Elasticsearch （1）”

在 Elasticsearch 中的搜索中，有两类搜索：

queries
aggregations

它们之间的区别在于：query 可以帮我们进行全文搜索，而 aggregation 可以帮我们对数据进行统计及分析。我们有时也可以结合 query 及 aggregation 一起使用，比如我们可以先对文档进行搜索然后在进行 aggregation :

GET blogs/_search
{
  "query": {
    "match": {
      "title": "community"
    }
  },
  "aggregations": {
    "top_authors": {
      "terms": {
        "field": "author"
      }
    }
  }
}

在上面的搜索中，先搜寻在 title 含有 community 的文档，然后再对数据进行 aggregation 。

搜索所有的文档

我们可以使用如下的命令来搜索到所有的文档：

GET /_search

在这里我们没有指定任何 index ，我们将搜索在该 cluster 下的所有的 index 。目前默认的返回个数是 10 个，除非我们设定 size :

GET /_search?size=20

上面的命令也等同于：

GET /_all/_search

我们也可以这样对多个 index 进行搜索：

POST /index1,index2,index3/_search

上面，表明，我们可以针对 index1，index2，index3 索引进行搜索。当然，我们甚至也可以这么写：

POST /index*,-index3/_search

上面表明，我们可以针对所有以 index 为开头的索引来进行搜索，但是排除 index3 索引。

如果我们只想搜索我们特定的 index ，比如 twitter ，我们可以这么做：

GET twitter/_search

从上面我们可以看出来，在 twitter index 里我们有 7 个文档。在上面的 hits 数组里，我们可以看到所有的结果。同时，我们也可以看到一个叫做 _score 的项。它表示我们搜索结果的相关度。这个分数值越高，表明我们搜索匹配的相关度越高。在默认没有 sort 的情况下，所有搜索的结果的是按照分数由大到小来进行排列的。

在默认的情况下，我们可以得到 10 个结果。我们可以通过设置 size 参数得到我们想要的个数。同时，我们可以也配合 from 来进行 page 。

GET twitter/_search?size=2&from=2

并且只要两个文档显示。我们可以通过这个方法让我们的文档进行分页显示。

上面的查询类似于 DSL 查询的如下语句：

GET twitter/_search
{
  "size": 2,
  "from": 2, 
  "query": {
    "match_all": {}
  }
}

我们可以通过 filter_path 来控制输出的较少的字段，比如：

GET twitter/_search?filter_path=hits.total

上面执行的结果将直接从 hits.total 开始进行返回：

{
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    }
  }
}

source filtering

我们可以通过 _source 来定义返回想要的字段：

GET twitter/_search
{
  "_source": ["user", "city"],
  "query": {
    "match_all": {
    }
  }
}

返回的结果:

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "city" : "北京",
          "user" : "张三"
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "city" : "北京",
          "user" : "老刘"
        }
      },
      ...
    ]

我们可以看到只有 user 及 city 两个字段在 _source 里返回。我们可以可以通过设置 _source 为 false ，这样不返回任何的 _source 信息：

GET twitter/_search
{
  "_source": false,
  "query": {
    "match": {
      "user": "张三"
    }
  }
}

返回的信息：

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 3.0808902
      }
    ]

我们可以看到只有 _id 及 _score 等信息返回。其它任何的 _source 字段都没有被返回。它也可以接收通配符形式的控制，比如：

GET twitter/_search
{
  "_source": {
    "includes": [
      "user*",
      "location*"
    ],
    "excludes": [
      "*.lat"
    ]
  },
  "query": {
    "match_all": {}
  }
}

返回的结果是：

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "location" : {
            "lon" : "116.325747"
          },
          "user" : "张三"
        }
      },
      ...
    ]

Script fields

有些时候，我们想要的 field 可能在 _source 里根本没有，那么我们可以使用 script field 来生成这些 field 。允许为每个匹配返回script evaluation（基于不同的字段），例如：

GET twitter/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "years_to_100": {
      "script": {
        "lang": "painless",
        "source": "100-doc['age'].value"
      }
    },
    "year_of_birth":{
      "script": "2019 - doc['age'].value"
    }
  }
}

返回的结果是：

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "years_to_100" : [
            80
          ],
          "year_of_birth" : [
            1999
          ]
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "fields" : {
          "years_to_100" : [
            70
          ],
          "year_of_birth" : [
            1989
          ]
        }
      },
    ...
  ]

必须注意的是这种使用 script 的方法来生成查询的结果对于大量的文档来说，可能会占用大量资源。

Count API

我们经常会查询我们的索引里到底有多少文档，那么我们可以使用 _count 重点来查询：

GET twitter/_count

如果我们想知道满足条件的文档的数量，我们可以采用如下的格式：

GET twitter/_count
{
  "query": {
    "match": {
      "city": "北京"
    }
  }
}

在这里，我们可以得到 city 为 “北京” 的所有文档的数量：

{
  "count" : 5,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}

修改 settings

我们可以通过如下的接口来获得一个 index 的 settings

GET twitter/_settings

从这里我们可以看到我们的 twitter index 有多少个 shards 及多少个 replicas 。我们也可以通过如下的接口来设置：

PUT twitte
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

一旦我们把 number_of_shards 定下来了，我们就不可以修改了，除非把 index 删除，并重新 index 它。这是因为每个文档存储到哪一个 shard 是和 number_of_shards 这个数值有关的。一旦这个数值发生改变，那么之后寻找那个文档所在的 shard 就会不准确。

修改 index 的 mapping

Elasticsearch 号称是 schemaless ，在实际所得应用中，每一个 index 都有一个相应的 mapping 。这个 mapping 在我们生产第一个文档时已经生产。它是对每个输入的字段进行自动的识别从而判断它们的数据类型。我们可以这么理解 schemaless：

不需要事先定义一个相应的 mapping 才可以生产文档。字段类型是动态进行识别的。这和传统的数据库是不一样的
如果有动态加入新的字段，mapping 也可以自动进行调整并识别新加入的字段

自动识别字段有一个问题，那就是有的字段可能识别并不精确，比如对于我们例子中的位置信息。那么我们需要对这个字段进行修改。

我们可以通过如下的命令来查询目前的 index 的 mapping :

GET twitter/_mapping

它显示的数据如下：

{
  "twitter" : {
    "mappings" : {
      "properties" : {
        "address" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "city" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "country" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "location" : {
          "properties" : {
            "lat" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "lon" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "message" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "province" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "uid" : {
          "type" : "long"
        },
        "user" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

从上面的显示中可以看出来 location 里的经纬度是一个 multi-field 的类型。

        "location" : {
          "properties" : {
            "lat" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "lon" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        }

这个显然不是我们所需的。正确的类型应该是：geo_point 。我们重新修正我们的 mapping 。

注意：我们不能为已经建立好的 index 动态修改 mapping 。这是因为一旦修改，那么之前建立的索引就变成不能搜索的了。一种办法是 reindex 从而重新建立我们的索引。如果在之前的 mapping 加入新的字段，那么我们可以不用重新建立索引。

为了能够正确地创建我们的 mapping ，我们必须先把之前的 twitter 索引删除掉，并同时使用 settings 来创建这个 index 。具体的步骤如下：

DELETE twitte
PUT twitte
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}
 
PUT twitter/_mapping
{
  "properties": {
    "address": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "age": {
      "type": "long"
    },
    "city": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "country": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "location": {
      "type": "geo_point"
    },
    "message": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "province": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "uid": {
      "type": "long"
    },
    "user": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    }
  }
}

重新查看我们的 mapping :

GET twitter/_mapping

我们可以看到我们已经创建好了新的 mapping 。我们再次运行之前我们的 bulk 接口，并把我们所需要的数据导入到 twitter 索引中。

POST _bulk
{ "index" : { "_index" : "twitter", "_id": 1} }
{"user":"双榆树-张三","message":"今儿天气不错啊，出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{ "index" : { "_index" : "twitter", "_id": 2 }}
{"user":"东城区-老刘","message":"出发，下一站云南！","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{ "index" : { "_index" : "twitter", "_id": 4} }
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{ "index" : { "_index" : "twitter", "_id": 5} }
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{ "index" : { "_index" : "twitter", "_id": 6} }
{"user":"虹桥-老吴","message":"好友来了都今天我生日，好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

至此，我们已经完整地建立了我们所需要的索引。在下面，我们开始使用 DSL（Domain Specifc Lanaguage）来帮我们进行查询。

查询数据

在这个章节里，我们来展示一下从我们的ES索引中查询我们所想要的数据。

match query

GET twitter/_search
{
  "query": {
    "match": {
      "city": "北京"
    }
  }
}

从我们查询的结果来看，我们可以看到有 5 个用户是来自北京的，而且查询出来的结果是按照关联（relavance) 来进行排序的。

在很多的情况下，我们也可以使用 script query 来完成：

GET twitter/_search
{
  "query": {
    "script": {
      "script": {
        "source": "doc['city'].contains(params.name)",
        "lang": "painless",
        "params": {
          "name": "北京"
        }
      }
    }
  }
}

上面的 script query 和上面的查询是一样的结果，但是我们不建议大家使用这种方法。相比较而言，script query 的方法比较低效。

上面的搜索也可以这么实现：

GET twitter/_search?q=city:"北京"

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "双榆树-张三",
          "message" : "今儿天气不错啊，出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          }
        }
      }
     ...
    ]

如果你想了解更多，你可以更进一步阅读 “ Elasticsearch: 使用URI Search ” 。

如果我们不需要这个 score ，我们可以选择 filter 来完成。

GET twitter/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "city.keyword": "北京"
        }
      }
    }
  }
}

这里我们使用了 filter 来过滤我们的搜索，显示的结果如下：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "user" : "双榆树-张三",
          "message" : "今儿天气不错啊，出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          }
        }
      },
 
   ...
}

从返回的结果来看，_score 项为 0 。对于这种搜索，只要 yes 或 no 。我们并不关心它们是的相关性。在这里我们使用了city.keyword 。对于一些刚接触 Elasticsearch 的人来说，这个可能比较陌生。正确的理解是 city 在我们的 mapping 中是一个 multi-field 项。它既是 text 也是 keyword 类型。对于一个 keyword 类型的项来说，这个项里面的所有字符都被当做一个字符串。它们在建立文档时，不需要进行 index。keyword 字段用于精确搜索，aggregation 和排序（sorting）。

所以在我们的 filter 中，我们是使用了 term 来完成这个查询。

我们也可以使用如下的办法达到同样的效果：

GET twitter/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "city.keyword": {
            "value": "北京"
          }
        }
      }
    }
  }
}

在我们使用 match query 时，默认的操作是 OR ，我们可以做如下的查询：

GET twitter/_search
{
  "query": {
    "match": {
      "user": {
        "query": "朝阳区-老贾",
        "operator": "or"
      }
    }
  }
}

上面的查询也和如下的查询是一样的：

GET twitter/_search
{
 "query": {
   "match": {
     "user": "朝阳区-老贾"
   }
 }
}

这是因为默认的操作是 or 操作。上面查询的结果是任何文档匹配：“朝”，“阳”，“区”，“老” 及 “贾” 这 5 个字中的任何一个将被显示：

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 4.4209847,
        "_source" : {
          "user" : "朝阳区-老贾",
          "message" : "123,gogogo",
          "uid" : 5,
          "age" : 35,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 2.9019678,
        "_source" : {
          "user" : "朝阳区-老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : 6,
          "age" : 50,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.8713734,
        "_source" : {
          "user" : "东城区-老刘",
          "message" : "出发，下一站云南！",
          "uid" : 3,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区台基厂三条3号",
          "location" : {
            "lat" : "39.904313",
            "lon" : "116.412754"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.4753614,
        "_source" : {
          "user" : "虹桥-老吴",
          "message" : "好友来了都今天我生日，好友来了,什么 birthday happy 就成!",
          "uid" : 7,
          "age" : 90,
          "city" : "上海",
          "province" : "上海",
          "country" : "中国",
          "address" : "中国上海市闵行区",
          "location" : {
            "lat" : "31.175927",
            "lon" : "121.383328"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.4356867,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        }
      }
    ]

我们也可以设置参数 minimum_should_match 来设置至少匹配的 term 。比如：

GET twitter/_search
{
  "query": {
    "match": {
      "user": {
        "query": "朝阳区-老贾",
        "operator": "or",
        "minimum_should_match": 3
      }
    }
  }
}

上面显示我们至少要匹配 “朝”，“阳”，“区”，“老”及“贾” 这 5 个中的 3 个字才可以。显示结果：

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 4.4209847,
        "_source" : {
          "user" : "朝阳区-老贾",
          "message" : "123,gogogo",
          "uid" : 5,
          "age" : 35,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 2.9019678,
        "_source" : {
          "user" : "朝阳区-老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : 6,
          "age" : 50,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          }
        }
      }
    ]

我们也可以改为 "and“ 操作看看：

GET twitter/_search
{
  "query": {
    "match": {
      "user": {
        "query": "朝阳区-老贾",
        "operator": "and"
      }
    }
  }
}

显示的结果是：

   "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 4.4209847,
        "_source" : {
          "user" : "朝阳区-老贾",
          "message" : "123,gogogo",
          "uid" : 5,
          "age" : 35,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          }
        }
      }
    ]

在这种情况下，需要同时匹配索引的 5 个字才可以。显然我们可以通过使用 and 来提高搜索的精度。

Multi_match

在上面的搜索之中，我们特别指明一个专有的 field 来进行搜索，但是在很多的情况下，我们并胡知道是哪一个 field 含有这个关键词，那么在这种情况下，我们可以使用 multi_match 来进行搜索：

GET twitter/_search
{
  "query": {
    "multi_match": {
      "query": "朝阳",
      "fields": [
        "user",
        "address^3",
        "message"
      ],
      "type": "best_fields"
    }
  }
}

在上面，我们可以同时对三个 fields: user，adress 及 message 进行搜索，但是我们对 address 含有 “朝阳” 的文档的分数进行 3 倍的加权。返回的结果：

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 6.1777167,
        "_source" : {
          "user" : "朝阳区-老王",
          "message" : "Happy good BirthDay My Friend!",
          "uid" : 6,
          "age" : 50,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 5.9349246,
        "_source" : {
          "user" : "朝阳区-老贾",
          "message" : "123,gogogo",
          "uid" : 5,
          "age" : 35,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          }
        }
      }
    ]

Prefix query

返回在提供的字段中包含特定前缀的文档。

GET twitter/_search
{
  "query": {
    "prefix": {
      "user": {
        "value": "朝"
      }
    }
  }
}

查询 user 字段里以 “朝” 为开头的所有文档：

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "user" : "朝阳区-老贾",
          "message" : "123,gogogo",
          "uid" : 5,
          "age" : 35,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "user" : "朝阳区-老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : 6,
          "age" : 50,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          }
        }
      }
    ]

Term query

Term query 会在给定字段中进行精确的字词匹配。因此，您需要提供准确的术语以获取正确的结果。

GET twitter/_search
{
  "query": {
    "term": {
      "user.keyword": {
        "value": "朝阳区-老贾"
      }
    }
  }
}

在这里，我们使用 user.keyword 来对 “朝阳区-老贾” 进行精确匹配查询相应的文档：

    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.5404451,
        "_source" : {
          "user" : "朝阳区-老贾",
          "message" : "123,gogogo",
          "uid" : 5,
          "age" : 35,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          }
        }
      }
    ]

Terms query

如果我们想对多个 terms 进行查询，我们可以使用如下的方式来进行查询：

GET twitter/_search
{
  "query": {
    "terms": {
      "user.keyword": [
        "双榆树-张三",
        "东城区-老刘"
      ]
    }
  }
}

上面查询 user.keyword 里含有 “双榆树-张三” 或 “东城区-老刘” 的所有文档。

复合查询（compound query）

什么是复合查询呢？如果说上面的查询是 leaf 查询的话，那么复合查询可以把很多个 leaf 查询组合起来从而形成更为复杂的查询。它一般的格式是：

POST _search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user" : "kimchy" }
      },
      "filter": {
        "term" : { "tag" : "tech" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tag" : "wow" } },
        { "term" : { "tag" : "elasticsearch" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

从上面我们可以看出，它是由 bool 下面的 must, must_not, should 及 filter 共同来组成的。针对我们的例子，

GET twitter/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "city": "北京"
          }
        },
        {
          "match": {
            "age": "30"
          }
        }
      ]
    }
  }
}

这个查询的是必须是北京城市的，并且年刚好是 30 岁的。

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.4823241,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.4823241,
        "_source" : {
          "user" : "东城区-老刘",
          "message" : "出发，下一站云南！",
          "uid" : 3,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区台基厂三条3号",
          "location" : {
            "lat" : "39.904313",
            "lon" : "116.412754"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.4823241,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        }
      }
    ]
  }
}

如果我们想知道为什么得出来这样的结果，我们可以在搜索的指令中加入"explained" : true。

GET twitter/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "city": "北京"
          }
        },
        {
          "match": {
            "age": "30"
          }
        }
      ]
    }
  },
  "explain": true
}

这样在我们的显示的结果中，可以看到一些一些解释：

我们的显示结果有 2 个。同样，我们可以把一些满足条件的排出在外，我们可以使用 must_not 。

GET twitter/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "city": "北京"
          }
        }
      ]
    }
  }
}

我们想寻找不在北京的所有的文档：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.0,
        "_source" : {
          "user" : "虹桥-老吴",
          "message" : "好友来了都今天我生日，好友来了,什么 birthday happy 就成!",
          "uid" : 7,
          "age" : 90,
          "city" : "上海",
          "province" : "上海",
          "country" : "中国",
          "address" : "中国上海市闵行区",
          "location" : {
            "lat" : "31.175927",
            "lon" : "121.383328"
          }
        }
      }
    ]
  }
}

我们显示的文档只有一个。他来自上海，其余的都北京的。

接下来，我们来尝试一下 should 。它表述 “或” 的意思，也就是有就更好，没有就算了。比如：

GET twitter/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "age": "30"
          }
        }
      ],
      "should": [
        {
          "match_phrase": {
            "message": "Happy birthday"
          }
        }
      ]
    }
  }
}

这个搜寻的意思是，age 必须是 30 岁，但是如果文档里含有 “Hanppy birthday” ，相关性会更高，那么搜索得到的结果会排在前面：

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.641438,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 2.641438,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "user" : "东城区-老刘",
          "message" : "出发，下一站云南！",
          "uid" : 3,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区台基厂三条3号",
          "location" : {
            "lat" : "39.904313",
            "lon" : "116.412754"
          }
        }
      }
    ]
  }
}

在上面的结果中，我们可以看到：同样是年龄 30 岁的两个文档，第一个文档由于含有 “Happy birthday” 这个字符串在 message里，所以它的结果是排在前面的，相关性更高。我们可以从它的 _score 中可以看出来。第二个文档里 age 是 30 ，但是它的message 里没有 “Happy birthday” 字样，但是它的结果还是有显示，只是得分比较低一些。

在使用上面的复合查询时，bool 请求通常是 must，must_not, should 及 filter 的一个或其中的几个一起组合形成的。我们必须注意的是：

查询类型对 hits 及 _score 的影响

Clause	影响 #hits	影响 _score
must	Yes	Yes
must_not	Yes	No
should	No*	Yes
filter	Yes	No

如上面的表格所示，should 只有在特殊的情况下才会影响 hits 。在正常的情况下它不会影响搜索文档的个数。那么在哪些情况下会影响搜索的结果呢？这种情况就是针对只有 should 的搜索情况，也就是如果你在 bool query 里，不含有 must, must_not 及 filter 的情况下，一个或更多的 should 必须有一个匹配才会有结果，比如：

GET twitter/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "city": "北京"
          }
        },
        {
          "match": {
            "city": "武汉"
          }
        }
      ]
    }
  }
}

上面的查询显示结果为：

  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 0.48232412,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "双榆树-张三",
          "message" : "今儿天气不错啊，出去转转去",
          "uid" : 2,
          "age" : 20,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "东城区-老刘",
          "message" : "出发，下一站云南！",
          "uid" : 3,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区台基厂三条3号",
          "location" : {
            "lat" : "39.904313",
            "lon" : "116.412754"
          }
        }
      },
  ...
}

在这种情况下，should 是会影响查询的结果的。

位置查询

Elasticsearch 最厉害的是位置查询。这在很多的关系数据库里并没有。我们举一个简单的例子：

GET twitter/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address": "北京"
          }
        }
      ]
    }
  },
  "post_filter": {
    "geo_distance": {
      "distance": "3km",
      "location": {
        "lat": 39.920086,
        "lon": 116.454182
      }
    }
  }
}

在这里，我们查找在地址栏里有 “北京” ，并且在以位置 (116.454182, 39.920086) 为中心的 3 公里以内的所有文档。

{
  "took" : 58,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.48232412,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "朝阳区-老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : 6,
          "age" : 50,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          }
        }
      }
    ]
  }
}

在我们的查询结果中只有一个文档满足要求。

下面，我们找出在 5 公里以内的所有位置信息，并按照远近大小进行排序：

GET twitter/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address": "北京"
          }
        }
      ]
    }
  },
  "post_filter": {
    "geo_distance": {
      "distance": "5km",
      "location": {
        "lat": 39.920086,
        "lon": 116.454182
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": "39.920086,116.454182",
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

在这里，我们看到了使用 sort 来对我们的搜索的结果进行排序。按照升序排列。

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "user" : "朝阳区-老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : 6,
          "age" : 50,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          }
        },
        "sort" : [
          1.1882901656104885
        ]
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "user" : "东城区-老刘",
          "message" : "出发，下一站云南！",
          "uid" : 3,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区台基厂三条3号",
          "location" : {
            "lat" : "39.904313",
            "lon" : "116.412754"
          }
        },
        "sort" : [
          3.9447355972239952
        ]
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        },
        "sort" : [
          4.837769064666224
        ]
      }
    ]
  }
}

我们可以看到有三个显示的结果。在 sort 里我们可以看到距离是越来越大啊。另外我们可以看出来，如果 _score 不是 sort 的field，那么在使用 sort 后，所有的结果的 _score 都变为 null 。如果排序的如果在上面的搜索也可以直接写作为：

GET twitter/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "address": "北京"
        }
      },
      "filter": {
        "geo_distance": {
          "distance": "5km",
          "location": {
            "lat": 39.920086,
            "lon": 116.454182
          }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": "39.920086,116.454182",
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

范围查询

在 ES 中，我们也可以进行范围查询。我们可以根据设定的范围来对数据进行查询：

GET twitter/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 30,
        "lte": 40
      }
    }
  }
}

在这里，我们查询年龄介于 30 到 40 岁的文档：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "user" : "东城区-老刘",
          "message" : "出发，下一站云南！",
          "uid" : 3,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区台基厂三条3号",
          "location" : {
            "lat" : "39.904313",
            "lon" : "116.412754"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "user" : "朝阳区-老贾",
          "message" : "123,gogogo",
          "uid" : 5,
          "age" : 35,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          }
        }
      }
    ]
  }
}

如上所示，我们找到了 3 个匹配的文档。同样地，我们也可以对它们进行排序：

GET twitter/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 30,
        "lte": 40
      }
    }
  },"sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
}

我们对整个搜索的结果按照降序进行排序。

Exists 查询

我们可以通过 exists 来查询一个字段是否存在。比如我们再增加一个文档：

PUT twitter/_doc/20
{
  "user" : "王二",
  "message" : "今儿天气不错啊，出去转转去",
  "uid" : 20,
  "age" : 40,
  "province" : "北京",
  "country" : "中国",
  "address" : "中国北京市海淀区",
  "location" : {
    "lat" : "39.970718",
    "lon" : "116.325747"
  }
}

在这个文档里，我们的 city 这一个字段是不存在的，那么一下的这个搜索将不会返回上面的这个搜索。

GET twitter/_search
{
  "query": {
    "exists": {
      "field": "city"
    }
  }
}

如果文档里只要 city 这个字段不为空，那么就会被返回。反之，如果一个文档里 city 这个字段是空的，那么就不会返回。

如果查询不含 city 这个字段的所有的文档，可以这样查询：

GET twitter/_search
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "city"
        }
      }
    }
  }
}

匹配短语

我们可以通过如下的方法来查找 happy birthday 。

GET twitter/_search
{
  "query": {
    "match": {
      "message": "happy birthday"
    }
  }
}

展示的结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.9936417,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.9936417,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.733287,
        "_source" : {
          "user" : "朝阳区-老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : 6,
          "age" : 50,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          }
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.84768087,
        "_source" : {
          "user" : "虹桥-老吴",
          "message" : "好友来了都今天我生日，好友来了,什么 birthday happy 就成!",
          "uid" : 7,
          "age" : 90,
          "city" : "上海",
          "province" : "上海",
          "country" : "中国",
          "address" : "中国上海市闵行区",
          "location" : {
            "lat" : "31.175927",
            "lon" : "121.383328"
          }
        }
      }
    ]
  }
}

在默认的情况下，这个匹配是 “或” 的关系，也就是找到文档里含有 “Happy" 或者 “birthday” 的文档。如果我们新增加一个文档：

PUT twitter/_doc/8
{
  "user": "朝阳区-老王",
  "message": "Happy",
  "uid": 6,
  "age": 50,
  "city": "北京",
  "province": "北京",
  "country": "中国",
  "address": "中国北京市朝阳区国贸",
  "location": {
    "lat": "39.918256",
    "lon": "116.467910"
  }
}

那么我们重新进行搜索，我们可以看到这个新增加的 id 为 8 的也会在搜索出的结果之列，虽然它只含有 “Happy" 在 message 里。

如果我们想得到 “与” 的关系，我们可以采用如下的办法：

GET twitter/_search
{
  "query": {
    "match": {
      "message": {
        "query": "happy birthday",
        "operator": "and"
      }
    }
  }
}

经过这样的修改，我们再也看不见那个 id 为 8 的文档了，这是因为我们必须在 message 中同时匹配 “happy” 及 “birthday” 这两个词。

我们还有一种方法，那就是：

GET twitter/_search
{
  "query": {
    "match": {
      "message": {
        "query": "happy birthday",
        "minimum_should_match": 2
      }
    }
  }
}

在这里，我们采用了 “minimum_should_match” 来表面至少有 2 个是匹配的才可以。

我们可以看到在搜索到的结果中，无论我们搜索的是大小写字母，在搜索的时候，我们都可以匹配到，并且在 message 中，happy birthday 这两个词的先后顺序也不是很重要。比如，我们把 id 为 5 的文档更改为：

PUT twitter/_doc/5
{
  "user": "朝阳区-老王",
  "message": "BirthDay My Friend Happy !",
  "uid": 6,
  "age": 50,
  "city": "北京",
  "province": "北京",
  "country": "中国",
  "address": "中国北京市朝阳区国贸",
  "location": {
    "lat": "39.918256",
    "lon": "116.467910"
  }
}

在这里，我们有意识地把 BirthDay 弄到 Happy 的前面。我们再次使用上面的查询看看是否找到 id 为 5 的文档。

显然，match 查询时时不用分先后顺序的。我们下面使用 match_phrase 来看看。

GET twitter/_search
{
  "query": {
    "match_phrase": {
      "message": "Happy birthday"
    }
  },
  "highlight": {
    "fields": {
      "message": {}
    }
  }
}

在这里，我们可以看到我们使用了 match_phrase 。它要求 Happy 必须是在 birthday 的前面。下面是搜寻的结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.6363969,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.6363969,
        "_source" : {
          "user" : "东城区-李四",
          "message" : "happy birthday!",
          "uid" : 4,
          "age" : 30,
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          }
        },
        "highlight" : {
          "message" : [
            "<em>happy</em> <em>birthday</em>!"
          ]
        }
      }
    ]
  }
}

假如我们把我们之前的那个 id 为 5 的文档修改为：

PUT twitter/_doc/5
{
  "user": "朝阳区-老王",
  "message": "Happy Good BirthDay My Friend!",
  "uid": 6,
  "age": 50,
  "city": "北京",
  "province": "北京",
  "country": "中国",
  "address": "中国北京市朝阳区国贸",
  "location": {
    "lat": "39.918256",
    "lon": "116.467910"
  }
}

在这里，我们在 Happy 和 Birthday 之前加入了一个 Good 。如果用我们之前的那个 match_phrase 是找不到这个文档的。为了能够找到上面这个修正的结果，我们可以使用：

GET twitter/_search
{
  "query": {
    "match_phrase": {
      "message": {
        "query": "Happy birthday",
        "slop": 1
      }
    }
  },
  "highlight": {
    "fields": {
      "message": {}
    }
  }
}

注意：在这里，我们使用了 slop 为 1，表面 Happy 和 birthday 之前是可以允许一个 token 的差别。

（由于字数限制，Named queries / SQL查询 / Multi Search API / 多个索引操作以及 Profile API 请移步文章下方原文链接查看。）

总结

在今天的文章里，我们介绍了如何使用 Elasticsearch 所提供的 DSL 来对我们的 index 进行搜索。Elasticsearch 为 index 提供了丰富的搜索方式。在这里就算是抛转引玉。在接下来的文章 “开始使用Elasticsearch （3）” 里我们来重点介绍一下聚合 aggregation及 analyzer。

如果你想了解更多关于Elastic Stack相关的知识，请参阅我们的官方网站：https://www.elastic.co/guide/index.html

————————————————

原文链接：https://elasticstack.blog.csdn.net/article/details/99546568