开源世界里的代码受社区推动和极客文化的影响,变化一直都很快。这点在 hadoop 生态圈里表现尤为突出,不过这也与 hadoop 得到业界的广泛应用以及各种需求推动密不可分(近几年大数据、云计算被炒烂的节奏 哈哈~)。生态圈里各个组件各种 bug、改进、新特性满天飞,刚看到下面某同学整理的 hadoop 版本变迁图之后,感觉也有必要整理下 hive 的新特性演进史,以备忘。
添加 Bitmap Indexes、TIMESTAMP datatype、Plugin Developer Kit、JDBC Driver Improvements 等新特性
该版本年代久远了,就不详述了~
具体请参考:http://blog.cloudera.com/blog/2011/11/coming-attractions-apache-hive-0-8-0/
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&version=12316178
1. 支持CREATE OR REPLACE VIEW 2. 增加错误提示 3. 支持NOT IN 和 NOT LIKE 4. Ctrl+c将会提交kill命令,kill掉当前运行的query job,并且不会退出hive cli 5. 输出map数和reduce数 6. 提升"select xx,xx from xxx LIMIT xxx"性能 7. 支持BETWEEN操作 8. PRINTF()函数 9. COALESCE/UNION ALL操作时候对数据类型宽限 10. 增加TIMESTAMP数据类型 11. 增加"INSERT OVERWRITE TABLE X PARTITION (a=b, c=d) IF NOT EXISTS ..."操作,如果分区存在,则不会动. 12. 提升hive任务提交后任务编译和启动的性能。 具体请参考:Whats new in Apache Hive 0.9.0
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&version=12317742
Cube and Rollup: Hive now has support for creating cubes with rollups. Thanks to Namit!
List Bucketing: This is an optimization that lets you better handle skew in your tables. Thanks to Gang!
Better Windows Support: Several Hive 0.10.0 fixes support running Hive natively on Windows. There is no more cygwin dependency. Thanks to Kanna!
‘Explain’ Adds More Info: Now you can do an explain dependency and the explain plan will contain all the tables and partitions touched upon by the query. Thanks to Sambavi!
Improved Authorization: The metastore can now optionally do authorization checks on the server side instead of on the client, providing you with a better security profile. Thanks to Sushanth!
Faster Simple Queries: Some simple queries that don’t require aggregations, and therefore MapReduce jobs, can now run faster.Thanks to Navis!
Better YARN Support: This release contains additional work aimed at making Hive work well with Hadoop YARN. While not all test cases are passing yet, there has been a lot of good progress made with this release. Thanks to Zhenxiao!
Union Optimization: Hive queries with unions will now result in a lower number of MapReduce jobs under certain conditions. Thanks to Namit!
Undo Your Drop Table: While not really truly ‘undo’, you can now reinstate your table after dropping it. Thanks to Andrew!
Show Create Table: The lets you see how you created your table. Thanks to Feng!
Support for Avro Data: Hive now has built-in support for reading/writing Avro data. Thanks to Jakob!
Skewed Joins: Hive’s support for joins involving skewed data is now improved. Thanks to Namit!
Robust Connection Handling at the Metastore Layer: Connection handling between a metastore client and server and also between a metastore server and the database layer has been improved. Thanks to Bhushan and Jean!
More Statistics: Its now possible to collect and store scalar-valued statistics for your tables and partitions. This will enable better query planning in upcoming releases. Thanks to Shreepadma!
Better-Looking HWI : HWI now uses a bootstrap javascript library. It looks really slick.
具体请参考: http://zh.hortonworks.com/blog/apache-hive-0-10-0-is-now-available/
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
具体请参考:http://zh.hortonworks.com/blog/apache-hive-0-11-stinger-phase-1-delivered/
具体请参考:http://zh.hortonworks.com/blog/announcing-apache-hive-0-12/
具体请参考:http://zh.hortonworks.com/blog/announcing-apache-hive-0-13-completion-stinger-initiative/
[HIVE-5317] - Implement insert, update, and delete in Hive with full ACID support
[HIVE-5775] - Introduce Cost Based Optimizer to Hive
[HIVE-5823] - Support for DECIMAL primitive type in AvroSerDe
[HIVE-6455] - Scalable dynamic partitioning and bucketing optimization
[HIVE-6469] - skipTrash option in hive command line
[HIVE-6806] - CREATE TABLE should support STORED AS AVRO
[HIVE-7036] - get_json_object bug when extract list of list with index
[HIVE-7054] - Support ELT UDF in vectorized mode
[HIVE-7068] - Integrate AccumuloStorageHandler
[HIVE-7090] - Support session-level temporary tables in Hive
[HIVE-7158] - Use Tez auto-parallelism in Hive
[HIVE-7203] - Optimize limit 0
[HIVE-7255] - Allow partial partition spec in analyze command
[HIVE-7299] - Enable metadata only optimization on Tez
[HIVE-7341] - Support for Table replication across HCatalog instances
[HIVE-7390] - Make single quote character optional and configurable in BeeLine CSV/TSV output
[HIVE-7416] - provide context information to authorization checkPrivileges api call
[HIVE-7430] - Implement SMB join in tez
[HIVE-7446] - Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
[HIVE-7506] - MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)
[HIVE-7509] - Fast stripe level merging for ORC
[HIVE-7547] - Add ipAddress and userName to ExecHook
[HIVE-7587] - Fetch aggregated stats from MetaStore
[HIVE-7654] - A method to extrapolate columnStats for partitions of a table
[HIVE-7826] - Dynamic partition pruning on Tez
[HIVE-8531] - Fold is not null filter if there are other comparison filter present on same column
该版本无新特性
[HIVE-3405] - UDF initcap to obtain a string with the first letter of each word in uppercase other letters in lowercase
[HIVE-7122] - Storage format for create like table
[HIVE-8435] - Add identity project remover optimization
[HIVE-7998] - Enhance JDBC Driver to not require class specification
[HIVE-9039] - Support Union Distinct
[HIVE-9188] - BloomFilter support in ORC
[HIVE-9277] - Hybrid Hybrid Grace Hash Join
[HIVE-9302] - Beeline add commands to register local jdbc driver names and jars
[HIVE-9780] - Add another level of explain for RDBMS audience
[HIVE-10038] - Add Calcite's ProjectMergeRule.
[HIVE-10099] - Enable constant folding for Decimal
[HIVE-10591] - Support limited integer type promotion in ORC
[1] hive0.80, 0.90新特性 http://superlxw1234.iteye.com/blog/1564461
[2] hive 0.10 0.11新增特性综述 http://blog.csdn.net/lalaguozhe/article/details/11730817
[3] http://hive.apache.org/downloads.html
[4] Hive未来两年的路线图 http://www.infoq.com/cn/news/2014/09/hive
(1)支持ACID事务——用户将可以插入、更新和删除现有数据。Hive将由传统的一次写入、频繁读取的系统发展为一个支持变化数据分析的系统。 (2)实现亚秒级查询——用户可以将Hive用于像交互式仪表板和探究性分析这样对响应时间有更高要求的应用场景。 (3)全面支持SQL:2011 Analytics——用户可以使用标准SQL在Hive上部署复杂的报表,而且更快捷、更简便、更可靠。而基于成本的、功能强大的优化器可以确保工具生成的查询和复杂查询的运行速度。届时,Hive将在Hadoop上提供企业级SQL用户所享有的全部表达能力。它将在支持窗口函数、用户自定义函数、子查询、Rollup、Cube、标准聚集、内连接、外连接、半连接和交叉连接的基础上,增加对不等连接、集合函数(并、交、差)、时间间隔类型等的支持。 Stinger.next计划用时18个月,将分三个阶段交付。事务支持将于2014年底发布,亚秒级查询将在2015年上半年推出,而对SQL:2011 Analytics的全面支持则将于2015年底完成。 此外,Hive还将与机器学习框架Spark集成,使用户可以通过Hive运行机器学习模型。