首页
学习
活动
专区
工具
TVP
发布
精选内容/技术社群/优惠产品,尽在小程序
立即前往

Apache Zeppelin 整合 Spark 和 Hudi

一 环境信息

1.1 组件版本

1.2 环境准备

Zeppelin 整合 Spark 参考:Apache Zeppelin 一文打尽

Hudi0.14.0编译参考:Hudi0.14.0 最新编译

二 整合 Spark 和 Hudi

2.1 配置

%spark.conf

SPARK_HOME /usr/lib/spark

# set execution mode

spark.master yarn

spark.submit.deployMode client

# --jars

spark.jars /root/app/jars/hudi-spark3.2-bundle_2.12-0.14.0.jar

# --conf

spark.serializer org.apache.spark.serializer.KryoSerializer

spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCatalog

spark.sql.extensions org.apache.spark.sql.hudi.HoodieSparkSessionExtension

spark.kryo.registrator org.apache.spark.HoodieSparkKryoRegistrar

2.2 导入依赖

%spark

import scala.collection.JavaConversions._

import org.apache.spark.sql.SaveMode._

import org.apache.hudi.DataSourceReadOptions._

import org.apache.hudi.DataSourceWriteOptions._

import org.apache.hudi.common.table.HoodieTableConfig._

import org.apache.hudi.config.HoodieWriteConfig._

import org.apache.hudi.keygen.constant.KeyGeneratorOptions._

import org.apache.hudi.common.model.HoodieRecord

import spark.implicits._

2.3 插入数据

%spark

val tableName = "trips_table"

val basePath = "hdfs:///tmp/trips_table"

val columns = Seq("ts","uuid","rider","driver","fare","city")

val data =

Seq((1695159649087L,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco"),

(1695091554788L,"e96c4396-3fad-413a-a942-4cb36106d721","rider-C","driver-M",27.70 ,"san_francisco"),

(1695046462179L,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-D","driver-L",33.90 ,"san_francisco"),

(1695516137016L,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-F","driver-P",34.15,"sao_paulo"    ),

(1695115999911L,"c8abbe79-8d89-47ea-b4ce-4d224bae5bfa","rider-J","driver-T",17.85,"chennai"));

var inserts = spark.createDataFrame(data).toDF(columns:_*)

inserts.write.format("hudi").

option(PARTITIONPATH_FIELD_NAME.key(), "city").

option(TABLE_NAME, tableName).

mode(Overwrite).

save(basePath)

2.3 查询数据

结果:

+--------------------+-----+-------------+-------+--------+-------------+

|                uuid| fare|           ts|  rider|  driver|         city|

+--------------------+-----+-------------+-------+--------+-------------+

|e96c4396-3fad-413...| 27.7|1695091554788|rider-C|driver-M|san_francisco|

|9909a8b1-2d15-4d3...| 33.9|1695046462179|rider-D|driver-L|san_francisco|

|e3cf430c-889d-401...|34.15|1695516137016|rider-F|driver-P|    sao_paulo|

+--------------------+-----+-------------+-------+--------+-------------+

  • 发表于:
  • 原文链接https://page.om.qq.com/page/OAmmECNeM31R_YklmJuWUBQg0
  • 腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
  • 如有侵权,请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长 进交流群

领取专属 10元无门槛券

私享最新 技术干货

扫码加入开发者社群
领券