由于老版本的elasticsearch不支持date类型,因此之前的存储(5.0版本)都用了timestamp来进行设计。
当新的es版本(6.0)支持日期date_histogram统计聚合函数时,发现其interval可以设置相当灵活用于设置各种间隔,如下:
Here are the valid time specifications and their meanings:
milliseconds (ms)
Fixed length interval; supports multiples.
seconds (s)
1000 milliseconds; fixed length interval (except for the last second of a minute that contains a leap-second, which is 2000ms long); supports multiples.
minutes (m)
All minutes begin at 00 seconds.
hours (h)
All hours begin at 00 minutes and 00 seconds.
days (d)
All days begin at the earliest possible time, which is usually 00:00:00 (midnight).
weeks (w)
months (M)
quarters (q)
years (y)
然而对于原先老版本的timestamp如何实现其date_histogram,网上很多说法是无法进行直接的利用。而设置interval为相应秒数的情况下也无法确认为周或者月。
然而具体测试结果发现,ES能够自动识别数据的情况,进行测试。具体测试脚本如下:
(1)写入es,按照long的timestamp类型进行写入
'''
写入ES
'''
def WriteES():
es = Elasticsearch()
base = datetime.datetime.today()
numdays = 100
j = 0
actions = []
while (j <= 100):
d1 = base - datetime.timedelta(days = j)
ts= int(time.mktime(d1.timetuple())*1000)
action = {
"_index": "tickets",
"_type": "last",
"_id": j,
"_source": {
"count":randint(0,1000),
"timestamp": ts
}
}
actions.append(action)
j += 1
helpers.bulk(es, actions)
(2) 聚合测试:
def AggES():
client = Elasticsearch()
s = Search(using=client)
s.aggs.bucket('per_tag', 'date_histogram', field='timestamp', interval='week') \
.metric('clicks_per_day', 'sum', field='count')# \
response = s.execute()
print('查询结果')
for hit in response:
st = datetime.fromtimestamp(hit.timestamp//1000).strftime('%Y-%m-%d %H:%M:%S')
print(hit.meta.score, hit.count,st)
print('聚合结果')
for tag in response.aggregations.per_tag.buckets:
st = datetime.fromtimestamp(tag.key//1000).strftime('%Y-%m-%d %H:%M:%S')
print(st, tag.clicks_per_day.value)
(3)打印输出过程,可以发现可以快速实现按周的统计
查询结果 1.0 720 2018-11-06 16:44:03 1.0 438 2018-10-23 16:44:03 1.0 403 2018-10-18 16:44:03 1.0 113 2018-10-15 16:44:03 1.0 503 2018-10-13 16:44:03 1.0 928 2018-10-12 16:44:03 1.0 89 2018-10-11 16:44:03 1.0 590 2018-10-08 16:44:03 1.0 854 2018-09-27 16:44:03 1.0 846 2018-09-26 16:44:03 聚合结果 2018-07-23 08:00:00 618.0 2018-07-30 08:00:00 3657.0 2018-08-06 08:00:00 4519.0 2018-08-13 08:00:00 3609.0 2018-08-20 08:00:00 3204.0 2018-08-27 08:00:00 3378.0 2018-09-03 08:00:00 3365.0 2018-09-10 08:00:00 4609.0 2018-09-17 08:00:00 3594.0 2018-09-24 08:00:00 3918.0 2018-10-01 08:00:00 3098.0 2018-10-08 08:00:00 4251.0 2018-10-15 08:00:00 3235.0 2018-10-22 08:00:00 2689.0 2018-10-29 08:00:00 4493.0 2018-11-05 08:00:00 1254.0 work done!
(4)按月的统计:只需要修改相应配置
interval='month'
聚合结果 2018-07-01 08:00:00 2162.0 2018-08-01 08:00:00 15719.0 2018-09-01 08:00:00 16590.0 2018-10-01 08:00:00 15752.0 2018-11-01 08:00:00 3268.0