Elasticsearch的Index和Mapping（二）

用户3467126

发布于 2020-02-25 11:59:53

2.9K00

代码可运行

文章被收录于专栏：爱编码爱编码

运行总次数：0

代码可运行

前言

本文使用的Elasticsearch版本为6.5.4，基本命令以及操作大都通用。下面通过MySQL与Elasticsearch的对比图，让我们更好地理解接下来的增删改操作。

本文的所有操作是在kibana中进行，所以你需要安装相关软件，具体怎么安装可以参考上一篇文章。

Index操作

Index暂且认为它就是MySQL里的数据库，这里还有很多复杂的概念就暂时不讲，全部讲解操作部分。

「Index的创建模板如下：」

PUT /my_index_name_v1?pretty
{

  "aliases": {

    "my_index_name": {}

  },

  "settings": {

    "index": {

      "refresh_interval": "10s",

      "number_of_shards" : "12",

      "number_of_replicas" : "1",

      "search.slowlog.threshold.query.warn": "5s",

      "search.slowlog.threshold.query.info": "1s",

      "search.slowlog.threshold.fetch.warn": "1s",

      "search.slowlog.threshold.fetch.info": "800ms",

      "indexing.slowlog.threshold.index.warn": "12s",

      "indexing.slowlog.threshold.index.info": "5s"

    }

  },

  "mappings": {

    "my_type_name": {

      "properties": {

        "xxx_id": {

          "type": "keyword"

        },

        "timestamp" : {

          "type": "long"

        },

        "@timestamp" : {

          "type": "date"

        },

        "xxx_status": {

          "type": "integer"

        },

        "xxx_content": {

          "type": "text"

        }
      }
    }
  }
}

模板参数解析：

1、PUT ：注意是put方法，es的http接口严格遵从restful风格，创建属于put。大家在用某些工具注意选择正确的方法。

2、index名字：若业务类型只需要建立一个固定的index进行业务访问，强烈推荐让你的index名字加后缀_v1，方便后续因为主分片数调整或者调整某字段类型等原因需要reindex。若不加后缀，且没有指定好index的别名，最终的结果是reindex需要业务线停止写入，且需要改代码将访问index名字改为index的别名，这时可能会取名为xxx_v1，导致额外的工作。总之，建议index名字为your_indexname_v1，而别名为index_name。

3、pretty标记，建议加入，但不强制。

4、refresh_interval：该设置主要是每隔多久刷新数据，可以让刚刚写入的数据被查到。若写入数据量较大或者业务对于变更后及时查到的要求不高，则可以设置时间大一些。推荐一些粗糙的准则，若一天的写入能超过100g的数据量，则建议至少设置为10s，500g设置为60s，1T以上设置为120s。具体的以当时集群硬件配置和所有index读取写入的情况而定。

5、number_of_shards和number_of_replicas：主分片数和副本分片数，推荐直接设置为12，副本分片数设置为1。具体可以参考文章：https://blog.csdn.net/tanruixing/article/details/87883896

6、慢日志设置：建议读取写入根据业务访问情况进行设置，唯一需要注意的是不要设置过小，则可能会将磁盘打满，甚至影响数据存储。强烈推荐必须设置，方便后续观察业务使用情况。

7、type名字：一般来讲，推荐一个index对应一个type，若有多个type，则所有的type的字段大部分应该是相同的。若全部不同，推荐将type设置为index的名字，分成多个index，防止由于文档字段稀疏导致浪费存储。

8、字段名称包含id，推荐用keyword类型，若业务能确认一定是字符串类型，则可以用long型

9、时间戳类型，推荐为long型，方便业务访问，或者date类型，方便kibana和grafana访问。

10、status或者type字段，推荐用integer类型，便于枚举

11、content字段，推荐确认对应的分词器，设置为text类型，不推荐用keyword。特别是字段很长的情况。

https://blog.csdn.net/tanruixing/article/details/88426009

Mapping数据类型

1、核心数据类型

字符串型：text、keyword（不会分词）; 数值型：long、integer、short、byte、double、float、half_float等; 日期类型：date; 布尔类型：boolean; 二进制类型：binary; 范围类型：integer_range、float_range、long_range、double_range、date_range.

2、复杂数据类型

数组类型：array; 对象类型：object; 嵌套类型：nested object; 地理位置数据类型：geo_point、geo_shape; 专用类型：ip（记录ip地址）、completion（实现自动补全）、token_count（记录分词数）、murmur3（记录字符串hash值）

3、多字段特性

多字段特性（multi-fields），表示允许对同一字段采用不同的配置，比如分词。

常见例子是对人名实现拼音搜索，只需要在人名中新增一个字段pinyin即可。但是这种方式不是十分优雅，multi-fields可以在不改变整体结构的前提下，增加一个子字段：

Dynamic mapping

在前面说过，在写入文档的时候如果index不存在的话es会自动创建这个索引。但是es是如何确定index字段的类型的呢？

1、es可以自动识别文档字段的类型

es是依靠json文档的字段类型来实现自动识别字段类型的：

2、日期自动识别

日期的自动识别可以自行配置日期的格式，默认情况下是：

["strict_date_opeional_time", "yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"]

strict_date_opeional_time 是ISO 标准的日期格式，完整的格式如下：

YYYY-MM-DDhh:mm:ssTZD(eg:1997-07-16y19:20:30+01:00)

那么如何设置自己的格式呢？

dynamic_date_formats：可以自定义日期类型;

date_detection：可以关闭日期自动识别机制（默认开启）

### 自定义日期类型
PUT test_index
{
  "mappings": {
    "doc": {
      "dynamic_date_formats": ["MM/dd/yyyy"]
    }
  }
}

PUT test_index/doc/1
{
  "create_time": "09/21/2016"
}

GET test_index/_mapping


GET test_index/doc/1

### 关闭日期自动识别
PUT test_index
{
  "mappings": {
    "doc": {
      "date_detection": false
    }
  }
}

Mapping操作

mapping是类似于数据库中的表结构定义。上面已经有部分提及，但是并不够详细，这里就继续写一下。

1、查看mapping

GET /[index_name]/_mapping

2、创建mapping

mapping中的字段类型一旦设置，禁止直接修改，因为 lucene实现的倒排索引生成后不允许修改，应该重新建立新的索引，然后做reindex操作。但是可以新增字段，通过 dynamic 参数来控制字段的新增，这个参数的值如下：

true：默认值，表示允许选自动新增字段;

false：不允许自动新增字段，但是文档可以正常写入，但无法对字段进行查询等操作;

strict：严格模式，文档不能写入，报错.

「示例」

## 创建mapping，即表结构
PUT my_index
{
  "mappings": {
    "doc": {
      "dynamic": false,
      "properties": {
        "title": {
          "type": "text"
        },
        "name": {
          "type": "keyword"
        },
        "age": {
          "type": "integer"
        }
      }
    }
  }
}

## 查看结构
GET my_index/_mapping

### 最后写入文档试试
PUT my_index/doc/1
{
  "title": "hello world",
  "desc": "this is book"
}
## 执行查询
GET my_index/doc/_search
{
  "query": {
    "match": {
      "desc": "book"
    }
  }
}

结果如下图，根据desc是没法查询出结果的。

同理操作，你可以试试strict模式。

3、copy_to参数说明

作用是将该字段的值复制到目标字段，实现类似_all的作用。不会出现在_source中，只能用来搜索。

## 1、full_name的内容就是从 first_name 和 last_name 中复制过来的。
PUT my_index4
{
  "mappings": {
    "doc": {
      "properties": {
        "first_name": {
          "type": "text"
          , "copy_to": "full_name"
        },
        "last_name": {
          "type": "text"
          , "copy_to": "full_name"
        },
        "full_name" : {
          "type": "text"
        }
      }
    }
  }
}

## 2、添加数据
PUT my_index4/doc/1
{
  "first_name": "john",
  "last_name": "smith"
}

## 3、查询包含关键字john smith的文档，必须同时包含两个关键字才返回。
GET my_index4/_search
{
  "query": {
    "match": {
      "full_name": {
        "query": "john smith",
        "operator": "and"
      }
    }
  }
}

最终结果如下图：

4、index参数

index参数作用是控制当前字段是否被索引，默认为true，false表示不记录，即不可被搜索。当在es中存储了一些不想要被检索的字段如身份证、手机等，这是对于这些字段就可以使用index设置为false，这样有一定的安全性还可以节省空间。

示例如下：

PUT my_index5
{
  "mappings": {
    "doc": {
      "properties": {
        "cookie": {
          "type": "text",
          "index": false
        },
        "content": {
          "type": "text",
          "index": true
        }
      }
    }
  }
}

PUT my_index5/doc/1
{
  "cookie": "name=mike",
  "content": "hello world"
}

GET my_index5/_search
{
  "query": {
    "match": {
      "cookie": "mike"
    }
  }
}

GET my_index5/_search
{
  "query": {
    "match": {
      "content": "hello"
    }
  }
}

5、index_options参数

index_options的作用是用于控制倒排索引记录的内容，有如下四种配置：

docs：只记录doc id

freqs：记录doc id 和term frequencies

positions：记录doc id、 term frequencies和term position

offsets：记录doc id、 term frequencies、term position、character offsets

text类型的默认配置为positions，其他默认为docs。记录的内容越多，占据的空间越大

6、null_value参数

这个参数的作用是当字段遇到null值的时候的处理策略，默认为null，即空值，此时es会忽略该值。可以通过这个参数设置某个字段的默认值。

Dynamic Templates

Dynamic Templates 意为动态模板，它的作用是允许根据es自动识别的「数据类型、字段名等来动态设定字段类型」。（有点类似正则表达式，符合表达则定义为什么类型。）

可以实现的效果如下：1)所有字符串类型都设置为keyword类型，即默认不分词; 2)所有以message开头的字段都设置为text类型，即分词; 3)所有以long_开头的字段都设置为long类型; 4)所有自动匹配为double类型的都设定为float类型，以节省空间

模板如下：

match_mapping_type：匹配es自动识别的字段类型，如boolean、long等; match、unmatch：匹配字段名; path_match、path_unmatch：匹配路径;

案例如下：

### 1、在默认情况下，字符串被识别成为text类型，并且有一个子字段keyword。

PUT test_index/doc/1
{
  "name": "Tom"
}

GET test_index/_mapping

DELETE test_index

### 2、字段类型匹配### 将name字段的类型变成了 keyword类型
PUT test_index
{
  "mappings": {
    "doc": {
      "dynamic_templates": [
        {
          "strings_as_keywords": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }
        }
      ]
    }
  }
}

PUT test_index/doc/1
{
  "name": "Tom"
}

GET test_index/_mapping

DELETE test_index

### 3、字段匹配###将以message开头的字段且为string的匹配称为text类型，其余为keyword。### Dynamic Templates 的匹配顺序是从上到下执行的，匹配到一个后后面的规则就会跳过
PUT test_index
{
  "mappings": {
    "doc": {
      "dynamic_templates": [
        {
          "message_as_text": {
            "match_mapping_type": "string",
            "match": "message",
            "mapping": {
              "type": "text"
            }
          }
        },
        {
          "strings_as_keywords": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }
        }
      ]
    }
  }
}



PUT test_index/doc/1
{
  "name": "john",
  "message": "good boy"
}

GET test_index/_mapping

DELETE test_index

### 4、double设定为float，这样可以节省空间
PUT test_index
{
  "mappings": {
    "doc": {
      "dynamic_templates": [
        {
          "double_as_float": {
            "match_mapping_type": "double",
            "mapping": {
              "type": "float"
            }
          }
        }
      ]
    }
  }
}


PUT test_index/doc/1
{
  "name": 1.23,
  "message": "good boy"
}

自定义mapping

「方法一」：将你的数据json先加入到ES，然后再通过查看并拷贝mapping结构，再修改mapping名称或者你自己的类型，然后将最终的mapping运行创建一下，删除那个临时的Index就完事了。详细操作案例如下：

### 1、导入数据
PUT test_index/doc/1
{
  "referre": "-",
  "response_code": "200",
  "remote_ip": "172.0.0.1",
  "method": "POST",
  "username": "-",
  "http_version": "1.1",
  "body_sent": {
    "bytes": "0"
  },
  "url": "/helloworld"
}

GET test_index/_mapping

### 2、修改mapping名称，重新创建mapping
PUT product_index
{
  "mappings" : {
      "doc" : {
        "dynamic_templates" : [
          {
            "double_as_float" : {
              "match_mapping_type" : "double",
              "mapping" : {
                "type" : "float"
              }
            }
          }
        ],
        "properties" : {
          "body_sent" : {
            "properties" : {
              "bytes" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          },
          "http_version" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "message" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "method" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "name" : {
            "type" : "float"
          },
          "referre" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "remote_ip" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "response_code" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "url" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "username" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
}

### 3、查看mapping结构。
GET product_index/_mapping

DELETE text_index

「方法二：使用动态模板Dynamic Templates」

这里使用动态模板匹配所有字符串都设置为keyword类型，需要单独设置类型的在下面另行指出。

DELETE product_index

PUT product_index
{
    "mappings": {
      "doc": {
        "dynamic_templates": [
          {
            "strings": {
              "match_mapping_type": "string",
              "mapping": {
                "type": "keyword"
              }
            }
          }
        ],
        "properties": {
          "body_sent": {
            "properties": {
              "bytes": {
                "type": "long"
              }
            }
          },
          "url": {
            "type": "text"
          },
          "username": {
            "type": "keyword"
          }
        }
      }
    }
  }

参考文章

Elasticsearch篇之mapping https://blog.csdn.net/sinat_35930259/article/details/80354732

https://blog.csdn.net/tanruixing/article/details/88426009

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2020-02-17，如有侵权请联系 cloudcommunity@tencent.com 删除

https

网络安全

本文分享自爱编码微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

https

网络安全

登录后参与评论

0 条评论

热度