版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_42528266/article/details/102864277
简介:谈谈elasticsearch的分词原理
PUT test/_doc/1
{
"msg":"乔丹是篮球之神"
}
POST /test/_search
{
"query": {
"match": {
"msg": "乔丹"
}
}
}
我们发现能匹配⽂档出来,那整⼀个过程的原理是怎样的呢?
PUT test/_mapping
{
"properties": {
"msg_chinese": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
POST test/_doc/1
{
"msg":"乔丹是篮球之神",
"msg_chinese":"乔丹是篮球之神"
}
POST /test/_search
{
"query": {
"match": {
"msg_chinese": "乔"
}
}
}
POST /test/_search
{
"query": {
"match": {
"msg": "乔"
}
}
}
为什么同样是输⼊’乔’,为什么msg能匹配出⽂档,⽽msg_chinese不能呢?
POST test/_analyze
{
"field": "msg",
"text": "乔丹是篮球之神"
}
分词结果
乔,丹,是,篮,球,之,神
POST test/_analyze
{
"field": "msg_chinese",
"text": "乔丹是篮球之神"
}
分词结果
乔丹, 是, 篮球, 之神
POST test/_search
{
"query": {
"match": {
"msg_chinese": {
"query": "乔丹",
"analyzer": "standard"
}
}
}
}
POST _analyze
{
"text": "eating an apple",
"analyzer": "whitespace"
}