Apache Hudi 0.12.2发布

从大数据到人工智能

发布于 2023-01-12 11:27:17

6540

发布于 2023-01-12 11:27:17

文章被收录于专栏：大数据-BigData大数据-BigData

长期支持版本

我们的目标是维护 0.12 更长时间，并通过最新的 0.12.x 版本提供稳定版本供用户迁移。此版本 (0.12.2) 是最新的 0.12 版本。

迁移指南

此版本 (0.12.2) 没有引入任何新的表版本，因此如果您使用的是 0.12.0，则无需迁移。

如果从旧版本迁移，请查看之前发行说明中的迁移指南，特别是0.6.0, 0.9.0, 0.10.0, 0.11.0, and 0.12.0.中的升级说明。

bug修复

0.12.2 版本主要用于错误修复和稳定性。这些修复跨越许多组件，包括

DeltaStreamer
数据类型/模式相关的错误修复
Table服务
元数据表
Spark SQL
Presto 稳定性/性能修复
Trino 稳定性/性能修复
元同步
Flink 引擎
单元、功能、集成测试和 CI

Release Notes

Sub-task

[HUDI-5244] – Fix bugs in schema evolution client with lost operation field and not found schema

Bug

[HUDI-3453] – Metadata table throws NPE when scheduling compaction plan
[HUDI-3661] – Flink async compaction is not thread safe when use watermark
[HUDI-4281] – Using hudi to build a large number of tables in spark on hive causes OOM
[HUDI-4588] – Ingestion failing if source column is dropped
[HUDI-4855] – Bootstrap table from Deltastreamer cannot be read in Spark
[HUDI-4893] – More than 1 splits are created for a single log file for MOR table
[HUDI-4898] – for mor table, presto/hive shoud respect payload class during merge parquet file and log file
[HUDI-4901] – Add avro version to Flink profiles
[HUDI-4946] – merge into with no preCombineField has dup row in only insert
[HUDI-4952] – Reading from metadata table could fail when there are no completed commits
[HUDI-4966] – Meta sync throws exception if TimestampBasedKeyGenerator is used to generate partition path containing slashes
[HUDI-4971] – aws bundle causes class loading issue
[HUDI-4975] – datahub sync bundle causes class loading issue
[HUDI-4998] – Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work
[HUDI-5003] – InLineFileSystem will throw NumberFormatException, cause the type of startOffset is int and out of bounds
[HUDI-5007] – Prevent Hudi from reading the entire timeline's when performing a LATEST streaming read
[HUDI-5008] – Avoid unset HoodieROTablePathFilter in IncrementalRelation
[HUDI-5025] – Rollback failed with log file not found when rollOver in rollback process
[HUDI-5041] – lock metric register confict error
[HUDI-5057] – Fix msck repair hudi table
[HUDI-5058] – The primary key cannot be empty when Flink reads an error from the hudi table
[HUDI-5061] – bulk insert operation don't throw other exception except IOE Exception
[HUDI-5063] – totalScantime and other run time stats missing from commit metadata
[HUDI-5070] – Fix Flaky TestCleaner test : testInsertAndCleanByCommits
[HUDI-5076] – Non serializable path used with engineContext with metadata table initialization
[HUDI-5087] – Max value read from metatable incorrect
[HUDI-5088] – Failed to synchronize the hive metadata of the Flink table
[HUDI-5092] – Querying Hudi table throws NoSuchMethodError in Databricks runtime
[HUDI-5096] – boolean param is broken in HiveSyncTool
[HUDI-5097] – Read 0 records from partitioned table without partition fields in table configs
[HUDI-5151] – Flink data skipping doesn't work with ClassNotFoundException of InLineFileSystem
[HUDI-5157] – Duplicate partition path for chained hudi tables.
[HUDI-5163] – Failure handling w/ spark ds write failures
[HUDI-5176] – Incremental source may miss commits if there are inflight commits before completed commits
[HUDI-5185] – Compaction run fails with –hoodieConfigs
[HUDI-5203] – Debezium payload does not handle null-field cases
[HUDI-5228] – Flink table service job fs view conf overwrites the one of writing job
[HUDI-5242] – Do not fail Meta sync in Deltastreamer when inline table service fails
[HUDI-5251] – Unexpected avro dependency in flink 1.15 bundle
[HUDI-5253] – HoodieMergeOnReadTableInputFormat could have duplicate records issue if it contains delta files while still splittable
[HUDI-5260] – Insert into sql with strict insert mode and no preCombineField should not overwrite existing records
[HUDI-5277] – RunClusteringProcedure can't exit corretly
[HUDI-5286] – UnsupportedOperationException throws when enabling filesystem retry
[HUDI-5291] – NPE in collumn stats for null values
[HUDI-5320] – Spark SQL CTAS does not propagate Table properties to actual SparkSqlWriter
[HUDI-5325] – Fix Create Table to propagate properly Metadata Table enabling config
[HUDI-5336] – Fix log file parsing to consider "." at the beginning
[HUDI-5346] – Fixing performance traps in CTAS
[HUDI-5347] – Fix Merge Into performance traps
[HUDI-5350] – oom cause compaction event lost
[HUDI-5351] – Handle meta fields being disabled in Bulk Insert Partitioners
[HUDI-5373] – Different fileids are assigned to the same bucket
[HUDI-5375] – Fix re-using of file readers w/ metadata table in FileIndex
[HUDI-5393] – Remove the reuse of metadata table writer for flink write client
[HUDI-5403] – Input Format class has metadata table enabled for file listing unexpectedly by default
[HUDI-5409] – Avoid file index and use fs view cache in COW input format
[HUDI-5412] – Send the boostrap event if the JM also rebooted

Improvement

[HUDI-4526] – improve spillableMapBasePath disk directory is full
[HUDI-4799] – improve analyzer exception tip when can not resolve expression
[HUDI-4960] – Upgrade Jetty version for Timeline server
[HUDI-4980] – Make avg record size calculated based on commit instant only
[HUDI-4995] – Dependency conflicts on apache http with other projects
[HUDI-4997] – use jackson-v2 replace jackson-v1 import
[HUDI-5002] – Remove deprecated API usage in SparkHoodieHBaseIndex#generateStatement
[HUDI-5027] – Replace hardcoded hbase config keys with HbaseConstants
[HUDI-5045] – Add tests to integ test to test bulk_insert followed by upsert
[HUDI-5066] – Support hoodie source metaclient cache for flink planner
[HUDI-5102] – source operator(monitor and reader) support user uid
[HUDI-5104] – Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter
[HUDI-5111] – Add metadata on read support to integ tests
[HUDI-5184] – Remove export PYSPARK_SUBMIT_ARGS="–master local*" from HoodiePySparkQuickstart.py
[HUDI-5247] – Clean up java client tests
[HUDI-5296] – Support disabling schema on read if not required
[HUDI-5338] – Adjust coalesce behavior within "NONE" sort mode for bulk insert
[HUDI-5344] – Upgrade com.google.protobuf:protobuf-java
[HUDI-5345] – Avoid fs.exists calls for metadata table in HFileBootstrapIndex
[HUDI-5348] – Cache file slices within MDT reader
[HUDI-5357] – Optimize release artifacts' deployment
[HUDI-5370] – Properly close file handles for Metadata writer