前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >python实现elastcsearch中timestampe(long)类型的date_histogram聚合测试

python实现elastcsearch中timestampe(long)类型的date_histogram聚合测试

作者头像
sparkexpert
发布2019-05-26 14:09:41
9880
发布2019-05-26 14:09:41
举报

由于老版本的elasticsearch不支持date类型,因此之前的存储(5.0版本)都用了timestamp来进行设计。

当新的es版本(6.0)支持日期date_histogram统计聚合函数时,发现其interval可以设置相当灵活用于设置各种间隔,如下:

Here are the valid time specifications and their meanings:

milliseconds (ms)

Fixed length interval; supports multiples.

seconds (s)

1000 milliseconds; fixed length interval (except for the last second of a minute that contains a leap-second, which is 2000ms long); supports multiples.

minutes (m)

All minutes begin at 00 seconds.

  • One minute (1m) is the interval between 00 seconds of the first minute and 00 seconds of the following minute in the specified timezone, compensating for any intervening leap seconds, so that the number of minutes and seconds past the hour is the same at the start and end.
  • Multiple minutes (nm) are intervals of exactly 60x1000=60,000 milliseconds each.

hours (h)

All hours begin at 00 minutes and 00 seconds.

  • One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00 minutes of the following hour in the specified timezone, compensating for any intervening leap seconds, so that the number of minutes and seconds past the hour is the same at the start and end.
  • Multiple hours (nh) are intervals of exactly 60x60x1000=3,600,000 milliseconds each.

days (d)

All days begin at the earliest possible time, which is usually 00:00:00 (midnight).

  • One day (1d) is the interval between the start of the day and the start of of the following day in the specified timezone, compensating for any intervening time changes.
  • Multiple days (nd) are intervals of exactly 24x60x60x1000=86,400,000 milliseconds each.

weeks (w)

  • One week (1w) is the interval between the start day_of_week:hour:minute:second and the same day of the week and time of the following week in the specified timezone.
  • Multiple weeks (nw) are not supported.

months (M)

  • One month (1M) is the interval between the start day of the month and time of day and the same day of the month and time of the following month in the specified timezone, so that the day of the month and time of day are the same at the start and end.
  • Multiple months (nM) are not supported.

quarters (q)

  • One quarter (1q) is the interval between the start day of the month and time of day and the same day of the month and time of day three months later, so that the day of the month and time of day are the same at the start and end.
  • Multiple quarters (nq) are not supported.

years (y)

  • One year (1y) is the interval between the start day of the month and time of day and the same day of the month and time of day the following year in the specified timezone, so that the date and time are the same at the start and end.
  • Multiple years (ny) are not supported

然而对于原先老版本的timestamp如何实现其date_histogram,网上很多说法是无法进行直接的利用。而设置interval为相应秒数的情况下也无法确认为周或者月。

然而具体测试结果发现,ES能够自动识别数据的情况,进行测试。具体测试脚本如下:

(1)写入es,按照long的timestamp类型进行写入

代码语言:javascript
复制
'''
    写入ES
'''
def WriteES():
    es = Elasticsearch()
    
    base = datetime.datetime.today()
    numdays = 100
    
    j = 0
    actions = []
    while (j <= 100):
        d1 = base - datetime.timedelta(days = j)
        ts= int(time.mktime(d1.timetuple())*1000)
        action = {
            "_index": "tickets",
            "_type": "last",
            "_id": j,
            "_source": {
                "count":randint(0,1000),
                "timestamp": ts
                }
            }
        actions.append(action)
        j += 1
    
    helpers.bulk(es, actions)

(2) 聚合测试:

代码语言:javascript
复制
def AggES():
    client = Elasticsearch()
    
    s = Search(using=client)
    s.aggs.bucket('per_tag', 'date_histogram', field='timestamp', interval='week') \
        .metric('clicks_per_day', 'sum', field='count')# \
    
    response = s.execute()
    
    print('查询结果')
    for hit in response:
        st = datetime.fromtimestamp(hit.timestamp//1000).strftime('%Y-%m-%d %H:%M:%S')
        print(hit.meta.score, hit.count,st)
    
    print('聚合结果')
    for tag in response.aggregations.per_tag.buckets:
        st = datetime.fromtimestamp(tag.key//1000).strftime('%Y-%m-%d %H:%M:%S')
        print(st, tag.clicks_per_day.value)

(3)打印输出过程,可以发现可以快速实现按周的统计

查询结果 1.0 720 2018-11-06 16:44:03 1.0 438 2018-10-23 16:44:03 1.0 403 2018-10-18 16:44:03 1.0 113 2018-10-15 16:44:03 1.0 503 2018-10-13 16:44:03 1.0 928 2018-10-12 16:44:03 1.0 89 2018-10-11 16:44:03 1.0 590 2018-10-08 16:44:03 1.0 854 2018-09-27 16:44:03 1.0 846 2018-09-26 16:44:03 聚合结果 2018-07-23 08:00:00 618.0 2018-07-30 08:00:00 3657.0 2018-08-06 08:00:00 4519.0 2018-08-13 08:00:00 3609.0 2018-08-20 08:00:00 3204.0 2018-08-27 08:00:00 3378.0 2018-09-03 08:00:00 3365.0 2018-09-10 08:00:00 4609.0 2018-09-17 08:00:00 3594.0 2018-09-24 08:00:00 3918.0 2018-10-01 08:00:00 3098.0 2018-10-08 08:00:00 4251.0 2018-10-15 08:00:00 3235.0 2018-10-22 08:00:00 2689.0 2018-10-29 08:00:00 4493.0 2018-11-05 08:00:00 1254.0 work done!

(4)按月的统计:只需要修改相应配置

代码语言:javascript
复制
 interval='month'

聚合结果 2018-07-01 08:00:00 2162.0 2018-08-01 08:00:00 15719.0 2018-09-01 08:00:00 16590.0 2018-10-01 08:00:00 15752.0 2018-11-01 08:00:00 3268.0

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2018年11月09日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
Elasticsearch Service
腾讯云 Elasticsearch Service(ES)是云端全托管海量数据检索分析服务,拥有高性能自研内核,集成X-Pack。ES 支持通过自治索引、存算分离、集群巡检等特性轻松管理集群,也支持免运维、自动弹性、按需使用的 Serverless 模式。使用 ES 您可以高效构建信息检索、日志分析、运维监控等服务,它独特的向量检索还可助您构建基于语义、图像的AI深度应用。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档