Apache Hive走向内存计算，性能提升26倍

首席架构师智库

发布于 2018-04-09 15:49:26

1.4K0

发布于 2018-04-09 15:49:26

文章被收录于专栏：超级架构师

Apache Hive 2.1已于几个月前发布，它引入了内存计算，这使得Hive计算性能得到极大提升，这将会影响SQL On Hadoop目前的竞争局面。据测试，其性能提高约26倍。

Apache Hive 2.1新引入了6大性能，包括：

（1）LLAP。Apache Hive 2.0引入了LLAP（Live Long And Process），而2.1则对其进行了极大的优化，相比于Apache Hive 1，其性能提升约25倍；

（2）更鲁邦的SQL ACID支持；

（3）2X ETL性能提升。引入更智能的CBO（Cost Based Optimizer），更快的类型转换以及动态分区优化；

（4）支持存储过程。加大简化了从EDW迁移到Hive的流程。这是通过开源项目HPL/SQL（Apache开源协议，http://www.hplsql.org/）实现的，HPL/SQL的目的是为Apache Hive,SparkSQL, Impala 以及其他SQL-on-Hadoop 实现, 任何 NoSQL和 RDBMS增加存储过程的实现；

（5）对文本格式数据增加向量化计算的支持；

（6）引入新的诊断和监控工具，包括新的HiveServer2 UI，LLAPUI和改进的Tez UI。

接下来详细介绍对Apache Hive 2.1性能提升至关重要的优化：LLAP。LLAP是“Live Long and Process”的简写，它引入了分布式持久化查询服务，并结合经优化的数据缓存机制，可快速启动查询计算作业并避免无需的磁盘IO操作。简而言之，LLAP是下一代分布式计算架构，它能够智能地将数据缓存到多台机器内存中，并允许所有客户端共享这些缓存的数据，同时保留了弹性伸缩能力。

相比于Hive 1 + Tez，Hive2+ Tez+LLAP性能提升约26倍，测试结果如下图所示（测试结果是通过https://github.com/hortonworks/hive-testbench得到的）：

Hive2 LLAP的引入，标志着Apache Hive进入内存计算时代。总结起来，内存计算类型可分为以下三类：

其中，Type1已被Apache hadoop生态系统证明其性能不会太高，因而Hive直接进入Type2，目前对Type2中所有特性均支持地很好，包括分布式内存管理和优化，内存数据共享等。此外，Apache Hive正进一步优化性能，包括支持新型存储介质Flash，扩展LLAP能力，使其可以直接处理压缩数据而无需事先解压。

RECENT HIVE RELEASES

Apache Hive Version	Prior Enhancements
2.1	Hive LLAP: Persistent query servers with intelligent in-memory caching.ACID GA: Hardened and proven at scale.Expanded SQL Compliance: More capable integration with BI tools.Performance: Interactive query, 2x faster ETL.Security: Row / Column security extending to views, Column level security for Spark.Operations: LLAP integration in Ambari, new Grafana dashboards.
2.0	Speed: HBase to store Hive MetadataWorkflow: HPL/SQL – Implementing Procedural SQL in HiveScale: first version of LLAP, and Hive on SparkSQL: Hive-on-Spark Self Union/Join
1.2	Speed: Vectorized Map Join brings up to 5x faster map joinsScale: Hybrid-Hybrid Grace Hash Join allows analytical queries at large scale without complex tuningScale: Bloom Filter support added to ORCFileSQL: Added support for UNION DISTINCT and Interval Types
0.14	Speed: Cost-based optimizer for star and bushy join queriesScale: Temporary tablesScale: Transactions with ACID semantics
0.13	Speed: Hive on Tez, vectorized query engine & cost-based optimizerScale: dynamic partition loads and smaller hash tablesSQL: CHAR & DECIMAL datatypes, subqueries for IN / NOT IN
0.12	Speed: Vectorized query engine & ORCFile predicate pushdownSQL: Support for VARCHAR and DATE semantics, GROUP BY on structs and unions

Hive LLAP: Persistent query servers with intelligent in-memory caching.
ACID GA: Hardened and proven at scale.
Expanded SQL Compliance: More capable integration with BI tools.
Performance: Interactive query, 2x faster ETL.
Security: Row / Column security extending to views, Column level security for Spark.
Operations: LLAP integration in Ambari, new Grafana dashboards.

2.0

Speed: HBase to store Hive Metadata
Workflow: HPL/SQL – Implementing Procedural SQL in Hive
Scale: first version of LLAP, and Hive on Spark
SQL: Hive-on-Spark Self Union/Join

1.2

Speed: Vectorized Map Join brings up to 5x faster map joins
Scale: Hybrid-Hybrid Grace Hash Join allows analytical queries at large scale without complex tuning
Scale: Bloom Filter support added to ORCFile
SQL: Added support for UNION DISTINCT and Interval Types

0.14

Speed: Cost-based optimizer for star and bushy join queries
Scale: Temporary tables
Scale: Transactions with ACID semantics

0.13

Speed: Hive on Tez, vectorized query engine & cost-based optimizer
Scale: dynamic partition loads and smaller hash tables
SQL: CHAR & DECIMAL datatypes, subqueries for IN / NOT IN

0.12

Speed: Vectorized query engine & ORCFile predicate pushdown
SQL: Support for VARCHAR and DATE semantics, GROUP BY on structs and unions

参考资料：

http://zh.hortonworks.com/apache/hive/

http://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/

http://hortonworks.com/blog/apache-hive-going-memory-computing/

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2016-11-01，如有侵权请联系 cloudcommunity@tencent.com 删除

其他

本文分享自首席架构师智库微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

其他

登录后参与评论

0 条评论

热度