一、Mapping 字段类型: Elasticsearch 字段类型类似于 MySQL 中的字段类型。Elasticsearch 字段类型主要有:核心类型、复合类型、地理类型、特殊类型。
text
、keyword
新建一个 Mapping 映射,字段类型映射如下:
{
"settings": {"number_of_shards": 3,"number_of_replicas": 0},
"mappings": {
"properties": {
"file_id": {"type": "long"},
"trip_id": {"type": "long"},
"data_starttime_utc": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},
"data_endtime_utc": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},
"datauri": {"type": "text"},
"filesize": {"type": "long", "index": false},
"data_createtime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},
"data_modifytime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},
"upload_time": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},
"sensor_type": {"type": "keyword"},
"data_source_system": {"type": "keyword"},
"creator_id":{"type":"integer"},
"add_method": {"type":"keyword", "index": false},
"is_delete": {"type":"boolean"},
"is_exists": {"type":"boolean", "index": false},
"data_quality": {"type":"integer", "index": false}
}
}
}
常见的数字类型:
long 长度范围是-2^63 到 2^63 -1
integer 长度范围是 -2^32 到 2^32 -1
所以 file_id(文件id)和trip_id(trip_id)用的是 long ,而 creator_id(用户id) 使用 integer
time 都是日期类型,所以使用了 date 字段
text 类型适用于需要被全文检索的字段,例如新闻正文、邮件内容等比较长的文字。所以datauri(文件路径)使用了 text 类型
keyword 适合简短、结构化字符串,例如主机名、姓名等,可以用于过滤、排序、聚合检索,也可以用于精确查询。所以 sensor_type(传感器类型) 和 data_source_system(源系统) 使用了 keyword 类型
index 索引为false,说明这个字段只用于存储,不会用于搜索,搜索这个字段是搜索不到的
二、查询请求字段介绍:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 6.1370068,
"hits": [
{
"_index": "data",
"_type": "_doc",
"_id": "hp5rWoQB4d-rZxw5aUVa",
"_score": 6.1370068,
"_source": {
"is_delete": true,
"data_id": 12631,
"trip_id": 727,
"data_starttime_utc": null,
"data_endtime_utc": null,
"datauri": "uvc/uvc.yaml",
"filesize": 130,
"data_createtime": "2022-10-12 10:16:50",
"data_modifytime": "2022-10-12 10:16:50",
"upload_time": "2022-10-12 10:16:07",
"sensor_type": "CAMERA",
"data_source_system": "PRE_PROCESS",
"creator_id": 23,
"add_method": "API",
"is_exists": false,
"data_quality": null
}
},
{
"_index": "data",
"_type": "_doc",
"_id": "g55rWoQB4d-rZxw5aUUb",
"_score": 6.1312265,
"_source": {
"is_delete": true,
"data_id": 12634,
"trip_id": 727,
"data_starttime_utc": "2022-08-17 12:20:44",
"data_endtime_utc": "2022-08-17 12:23:52",
"datauri": "can1/ego_ctr_20220817122044_000000.bag",
"filesize": 31791531,
"data_createtime": "2022-10-12 10:16:50",
"data_modifytime": "2022-10-12 10:16:50",
"upload_time": "2022-10-12 10:14:53",
"sensor_type": "CAN",
"data_source_system": "PRE_PROCESS",
"creator_id": 23,
"add_method": "API",
"is_exists": false,
"data_quality": null
}
}
]
}
}
took 值告诉我们执行整个搜索请求耗费了多少毫秒
_shards 部分告诉我们在查询中参与分片的总数,以及这些分片成功了多少个失败了多少个。
timed_out 告诉我们查询是否超时
在 hits 数组中每个结果包含文档的 _index 、 _type 、 _id ,加上 _source 字段。这意味着我们可以直接从返回的搜索结果中使用整个文档。这不像其他的搜索引擎,仅仅返回文档的ID,需要你单独去获取文档。