前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Hive Table Sampling – Concept and Example

Hive Table Sampling – Concept and Example

作者头像
一个会写诗的程序员
发布2021-12-16 11:01:12
2140
发布2021-12-16 11:01:12
举报

Hive Table Sampling - Concept, Methods and Example, Hive Table Sampling Concept

The Relational databases like SQL server supports writing queries on a relatively small number of rows from the very large table. In this article, we will check Hive table sampling concept, methods and some examples.

The Hive TABLESAMPLE clause allows the users to write queries for samples of the data instead of the whole table. The sampling comes handy when you are working on the large tables and it takes time to return results. The TABLESAMPLE clause can be added to any table in the FROM clause.

Type of Hive Sampling

There are two type of Hive tables sampling

  • Sampling Bucketized Table
  • Hive Block Sampling

Hive Table Sampling Syntax

Bucketized Sampling

Following is the syntax of the Bucketized Sampling

Block Sampling

Where, the BUCKET is numbered starting from 1. colname indicates the column on which to sample each row in the table. Instead of colname, use rand() indicating sampling on the entire row instead of an individual column.

And n is the percent of data size in case of block sampling.

Hive Sampling Bucketized Table

The sampling Bucketized table allows you to get sample records using the number of buckets. The Bucketized sampling method can be used when your tables are bucketed.

You can provide the bucket number starting from 1 along with colname on which to sample each row in the Hive table. You can also use rand() indicating sampling on the entire row instead of an individual column.

For example, following example provides random sample rows from the bucket 1.

Hive Block Sampling

This sampling method will allow Hive to pick up at least n% data size. Note that, PERCENT doesn’t necessarily mean the number of rows, it is the percentage of table size. If your table is small then it may return all rows.

For example, in the following example the input size 0.1% or more will be used for the query.

Hope this helps 🙂

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2021/9/9 下午,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Type of Hive Sampling
  • Hive Table Sampling Syntax
    • Bucketized Sampling
      • Block Sampling
      • Hive Sampling Bucketized Table
      • Hive Block Sampling
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档