使用edge ngram
将每个单词都进行进一步的分词和切分,用切分后的ngram
来实现前缀搜索,比如’OD5046240000014238’这样一个订单号会被分解成’O’,’OD’,’OD’,’OD5’,’OD50’…‘OD5046240000014238’这样子,就可以实现前缀搜索或者搜索推荐.
不过我的业务系统中订单号OD5046240000014238
(后四位为userid的后四位)用户常常需要使用后面几位去模糊匹配订单列表,需要的分词效果如下.
12345678910 | 4238 14238 014238 0014238 ... 46240000014238 046240000014238 5046240000014238 D5046240000014238 OD5046240000014238 |
---|
123456 | curl -XPOST -H "Content-Type:application/json" 'http://localhost:9200/myindex/_analyze' -d '{ "text":"OD5046240000014238", "analyzer":"order_no_analyzer" } ' |
---|
返回结果
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110 | { "tokens": [ { "token": "4238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "14238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "0014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "00014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "0000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "40000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "6240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "46240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "046240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "5046240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "D5046240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 }, { "token": "OD5046240000014238", "start_offset": 0, "end_offset": 18, "type": "<ALPHANUM>", "position": 0 } ] } |
---|