前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Elasticsearch自定义分析器订单号搜索

Elasticsearch自定义分析器订单号搜索

作者头像
XING辋
发布2019-03-26 10:56:45
8190
发布2019-03-26 10:56:45
举报
文章被收录于专栏:M莫的博客

使用edge ngram将每个单词都进行进一步的分词和切分,用切分后的ngram来实现前缀搜索,比如’OD5046240000014238’这样一个订单号会被分解成’O’,’OD’,’OD’,’OD5’,’OD50’…‘OD5046240000014238’这样子,就可以实现前缀搜索或者搜索推荐.

不过我的业务系统中订单号OD5046240000014238(后四位为userid的后四位)用户常常需要使用后面几位去模糊匹配订单列表,需要的分词效果如下.

12345678910

4238 14238 014238 0014238 ... 46240000014238 046240000014238 5046240000014238 D5046240000014238 OD5046240000014238

自定义分析器

  • 创建索引指定分析器 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25curl -XPUT -H "Content-Type:application/json" 'http://localhost:9200/myindex' -d '{ "settings": { "analysis": { "filter": { "order_no_edge_ngram_filter" : { "type" : "edge_ngram", "min_gram" : 4, "max_gram" : 25 } }, "analyzer": { "order_no_analyzer" : { "type" : "custom", "tokenizer" : "standard", "filter" : [ "reverse", "order_no_edge_ngram_filter", "reverse" ] } } } } } '
  • 测试分词器

123456

curl -XPOST -H "Content-Type:application/json" 'http://localhost:9200/myindex/_analyze' -d '{ "text":"OD5046240000014238", "analyzer":"order_no_analyzer" } '

返回结果

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110

{ "tokens": [ { "token": "4238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "14238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "0014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "00014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "0000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "40000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "6240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "46240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "046240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "5046240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "D5046240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "OD5046240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 } ] }

reference

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2018-12-30,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 自定义分析器
  • reference
相关产品与服务
Elasticsearch Service
腾讯云 Elasticsearch Service(ES)是云端全托管海量数据检索分析服务,拥有高性能自研内核,集成X-Pack。ES 支持通过自治索引、存算分离、集群巡检等特性轻松管理集群,也支持免运维、自动弹性、按需使用的 Serverless 模式。使用 ES 您可以高效构建信息检索、日志分析、运维监控等服务,它独特的向量检索还可助您构建基于语义、图像的AI深度应用。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档