Elasticsearch-初识查询

用户2825413

发布于 2019-07-16 11:00:20

5330

发布于 2019-07-16 11:00:20

文章被收录于专栏：呆呆熊的技术路

本小节主要讲述关于Elasticsearch的几种常见查询,希望自己在使用时候再回来看此文更能快速理解其中含义.

本文所有实践基于Elasticsearch 2.3.3

我们先从查询小苍苍这个用户开始今天的话题:

1. 第一种方式(全字段检索)

因为我们已确定要查询name字段,不推荐使用,并且数据并不准确

curl  http://127.0.0.1:9200/synctest/article/_search?q=小苍苍

2. 第二种方式(term 表示包含某精确值)

curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
-d '{
    "filter":{
        "term":{
            "name":"小苍苍"
        }
    }
}'

通常的规则是，使用查询（query）语句来进行全文搜索或者其它任何需要影响相关性得分的搜索。除此以外的情况都使用过滤（filters)。推荐使用语句query+filter,将会缓存filter部分数据,然后再进行评分过滤。下面我们将遇到这种组合模式

注意这里的term用法含义表示为包含某精确值，也就是说当 "name":["小苍苍","小衣衣"],条件也是成立的。

3. 第二种方式(query term查询)

curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
-d '{
    "query":{
        "term":{
            "name":"小苍苍"
        }
    }
}'

{
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "synctest",
      "_type" : "article",
      "_id" : "1",
      "_score" : 0.30685282,
      "_source" : {
        "name" : "小苍苍",
      }
    } ]
  }
}

默认query term也会自带评分, 如果不需此功能可以去掉, 更好的提供性能和缓存

4. 第四种方式 (filtered filter 关闭评分)

curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
-d '{
    "query":{
        "filtered":{
            "filter":{
                "term":{
                    "name":"小苍苍"
                }
            }
        }
    }
}'

{
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "synctest",
      "_type" : "article",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "name" : "小苍苍",
      }
    } ]
  }
}

使用 filter 并不计算得分，且它可以缓存文档, 所以当你不需要评分时候, 大部分场景下用它去查询小苍苍可以提高检索性能

你还可以使用 constant_score 来关闭评分

curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
-d '
 {
    "query":{
        "constant_score":{
            "filter":{
                "term":{
                    "name":"小苍苍"
                }
            }
        }
    }
}
'

多条件组合使用

**1. select * from article where name in ("小苍苍","小衣衣");**

 curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
 -d '{
    "query":{
        "constant_score":{
            "filter":{
                "terms":{
                    "name":[
                        "小苍苍",
                        "小衣衣"
                    ]
                }
            }
        }
    }
}'

如果我们想要获取2002年的某个用户,如何实现呢 (如果实现不同的OR、AND条件呢)

我们需要的更加复杂的查询-组合过滤器

{
   "bool" : {
      "must" :     [],  #AND
      "should" :   [],  #OR
      "must_not" : [],  #NOT
   }
}

must 所有的语句都必须（must）匹配，与 AND 等价。
must_not 所有的语句都不能（must not）匹配，与 NOT 等价。
至少有一个语句要匹配，与 OR 等价。

**select * from article where year=2002 and name like %苍天空%**

 curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
 -d '{
    "query":{
        "bool":{
            "must":[
                {
                    "term":{
                        "year":2002
                    }
                },
                {
                    "match":{
                        "user_name":"苍天空"
                    }
                }
            ]
        }
    }
}'

match等于like描述并不准确,而是取决于设置分词器模糊查询的结果. 禁用评分可以将query替换为filter

**select * from article where (year=2002 or name='麒麟臂') and name not like %苍天空%**

curl  http://127.0.0.1:9200/synctest/article/_search?pretty  -d '
{
    "query":{
        "bool":{
            "should":[
                {
                    "term":{
                        "year":2002
                    }
                },
                {
                    "term":{
                        "name":"麒麟臂"
                    }
                }
            ],
            "must_not":{
                "match":{
                    "user_name":"苍天空"
                }
            }
        }
    }
}'

我们发现must_not 并不是数组格式的,因为我们只有一个条件,当有多个条件时, 可以将must提炼成数组

类似(只关注语法即可):

{
    "query":{
        "bool":{
            "should":[
                {
                    "term":{
                        "year":2002
                    }
                },
                {
                    "term":{
                        "name":"麒麟臂"
                    }
                }
            ],
            "must_not":[
                {
                    "match":{
                        "user_name":"苍天空"
                    }
                },
                {
                    "term":{
                        "job":"teacher"
                    }
                }
            ]
        }
    }
}

更加灵活的should

curl  http://127.0.0.1:9200/synctest/article/_search?pretty  -d '
{
    "query":{
        "bool":{
            "should":[
                {
                    "term":{
                        "id":1
                    }
                },
                {
                    "match":{
                        "user_name":"苍天空"
                    }
                },
                {
                    "match":{
                        "nick_name":"小苍苍"
                    }
                }
            ],
            "minimum_should_match":2
        }
    }
}
'

minimum_should_match = 2 最少匹配两项, 如果不需要评分功能,可以直接将最外层query 替换为 filter 即可

还有另一种模式,实际中用处也非常大,我们来看看 query 和 filtered 的组合是有很大优势的,下面我们再看这条查询语句:

当我们有时候需要 分词查询 和 term 精确查询一起使用时,我们是希望term不需要缓存数据,而match根据匹配度进行排序

{
    "query":{
        "bool":{
            "must":[
                {
                    "match":{
                        "user_name":"小仓鼠"
                    }
                },
                {
                    "term":{
                        "id":1
                    }
                }
            ]
        }
    }
}

当我们使用上面的语句查询的时候,并不是最优解, 我们发现term参与了评分, 我们进行优化

curl  http://127.0.0.1:9200/synctest/article/_search?pretty  -d '
{
    "query":{
        "bool":{
            "must":[
                {
                    "match":{
                        "user_name":"小苍苍"
                    }
                }
            ],
            "filter":{
                "term":{
                    "id":1
                }
            }
        }
    }
}
'

通过观察max_score值,发现只对 user_name 进行了过滤, 这是非常重要的, 因为es可以首先执行 filter 并对此进行缓存优化。

范围查询

curl  http://127.0.0.1:9200/synctest/article/_search?pretty  -d '
  {
    "query":{
        "constant_score":{
            "filter":{
                "range":{
                    "id":{
                        "gte":1,
                        "lte":4
                    }
                }
            }
        }
    }
}
'

finish--分页和返回指定的字段

curl  http://127.0.0.1:9200/synctest/article/_search?pretty  -d '
{
    "from":1,
    "size":1,
    "query":{
        "terms":{
            "id":[
                1,
                2,
                6,
                9,
                15
            ]
        }
    },
    "sort":{
        "id":{
            "order":"desc"
        }
    }
}
'

我们使用了 from+size 的分页方式, 注意es的from+size模式分页是有局限和限制的,我们后面再讲. 我们还使用了 sort 对 id 进行倒叙排序。

但是我们在数据库操作中, 还经常使用返回某些字段呢, 尽量放弃select * 吧。

{
    "from":1,
    "size":1,
    "_source":[
        "id",
        "name"
    ],
    "query":{
        "terms":{
            "id":[
                1,
                2,
                6,
                9,
                15
            ]
        }
    },
    "sort":{
        "id":{
            "order":"desc"
        }
    }
}

使用 _source 即可,如果还是内嵌的对象, 还可以使用 userinfo.* 表示userinfo对象下面的字段全部返回。

到这里结束吧-接下来我们详细看下Elasticsearch的评分是如何操作的，我们如何更精细的控制它, 来做更加定制化的推荐。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2019-03-17，如有侵权请联系 cloudcommunity@tencent.com 删除

缓存

Elasticsearch Service

本文分享自呆呆熊的技术路微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

缓存

Elasticsearch Service

登录后参与评论

0 条评论

热度

Elasticsearch-初识查询

Elasticsearch-初识查询

我们先从查询小苍苍这个用户开始今天的话题:

1. 第一种方式(全字段检索)

2. 第二种方式(term 表示包含某精确值)

注意这里的term用法含义表示为包含某精确值，也就是说当 "name":["小苍苍","小衣衣"],条件也是成立的。

3. 第二种方式(query term查询)

4. 第四种方式 (filtered filter 关闭评分)

多条件组合使用

**1. select * from article where name in ("小苍苍","小衣衣");**

我们需要的更加复杂的查询-组合过滤器

**select * from article where year=2002 and name like %苍天空%**

**select * from article where (year=2002 or name='麒麟臂') and name not like %苍天空%**

更加灵活的should

还有另一种模式,实际中用处也非常大,我们来看看 query 和 filtered 的组合是有很大优势的,下面我们再看这条查询语句:

范围查询

finish--分页和返回指定的字段

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

Elasticsearch-初识查询

Elasticsearch-初识查询

我们先从查询小苍苍这个用户开始今天的话题:

1. 第一种方式(全字段检索)

2. 第二种方式(term 表示包含某精确值)

注意这里的term用法含义表示为 包含 某精确值，也就是说当 "name":["小苍苍","小衣衣"],条件也是成立的。

3. 第二种方式(query term查询)

4. 第四种方式 (filtered filter 关闭评分)

多条件组合使用

1. select * from article where name in ("小苍苍","小衣衣");

我们需要的更加复杂的查询-组合过滤器

select * from article where year=2002 and name like %苍天空%

select * from article where (year=2002 or name='麒麟臂') and name not like %苍天空%

更加灵活的should

还有另一种模式,实际中用处也非常大,我们来看看 query 和 filtered 的组合是有很大优势的,下面我们再看这条查询语句:

范围查询

finish--分页和返回指定的字段

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

注意这里的term用法含义表示为包含某精确值，也就是说当 "name":["小苍苍","小衣衣"],条件也是成立的。

**1. select * from article where name in ("小苍苍","小衣衣");**

**select * from article where year=2002 and name like %苍天空%**

**select * from article where (year=2002 or name='麒麟臂') and name not like %苍天空%**