前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Hudi数据湖技术引领大数据新风口(三)解决spark模块依赖冲突

Hudi数据湖技术引领大数据新风口(三)解决spark模块依赖冲突

作者头像
Maynor
发布2023-07-28 15:43:44
2050
发布2023-07-28 15:43:44
举报

解决spark模块依赖冲突

修改了Hive版本为3.1.2,其携带的jetty是0.9.3,hudi本身用的0.9.4,存在依赖冲突。

1)修改hudi-spark-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:

vim /opt/software/hudi-0.12.0/packaging/hudi-spark-bundle/pom.xml

在382行的位置,修改如下(红色部分):

代码语言:javascript
复制
<!-- Hive -->

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-service</artifactId>

   <version>${hive.version}</version>

   <scope>${spark.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <artifactId>guava</artifactId>

​     <groupId>com.google.guava</groupId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.pentaho</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-service-rpc</artifactId>

   <version>${hive.version}</version>

   <scope>${spark.bundle.hive.scope}</scope>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-jdbc</artifactId>

   <version>${hive.version}</version>

   <scope>${spark.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>javax.servlet</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>javax.servlet.jsp</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-metastore</artifactId>

   <version>${hive.version}</version>

   <scope>${spark.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>javax.servlet</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.datanucleus</groupId>

​     <artifactId>datanucleus-core</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>javax.servlet.jsp</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <artifactId>guava</artifactId>

​     <groupId>com.google.guava</groupId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-common</artifactId>

   <version>${hive.version}</version>

   <scope>${spark.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>org.eclipse.jetty.orbit</groupId>

​     <artifactId>javax.servlet</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

</dependency>

 

  <!-- 增加hudi配置版本的jetty -->

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-server</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-util</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-webapp</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-http</artifactId>

   <version>${jetty.version}</version>

  </dependency>

否则在使用spark向hudi表插入数据时,会报错如下:

java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V

img
img

2)修改hudi-utilities-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:

vim /opt/software/hudi-0.12.0/packaging/hudi-utilities-bundle/pom.xml

在405行的位置,修改如下(红色部分):

代码语言:javascript
复制
  <!-- Hoodie -->

  <dependency>

   <groupId>org.apache.hudi</groupId>

   <artifactId>hudi-common</artifactId>

   <version>${project.version}</version>

   <exclusions>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

  <dependency>

   <groupId>org.apache.hudi</groupId>

   <artifactId>hudi-client-common</artifactId>

   <version>${project.version}</version>

   <exclusions>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

 

 

<!-- Hive -->

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-service</artifactId>

   <version>${hive.version}</version>

   <scope>${utilities.bundle.hive.scope}</scope>

   <exclusions>

​		<exclusion>

​     <artifactId>servlet-api</artifactId>

​     <groupId>javax.servlet</groupId>

​    </exclusion>

​    <exclusion>

​     <artifactId>guava</artifactId>

​     <groupId>com.google.guava</groupId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.pentaho</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-service-rpc</artifactId>

   <version>${hive.version}</version>

   <scope>${utilities.bundle.hive.scope}</scope>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-jdbc</artifactId>

   <version>${hive.version}</version>

   <scope>${utilities.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>javax.servlet</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>javax.servlet.jsp</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-metastore</artifactId>

   <version>${hive.version}</version>

   <scope>${utilities.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>javax.servlet</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.datanucleus</groupId>

​     <artifactId>datanucleus-core</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>javax.servlet.jsp</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

​    <exclusion>

​     <artifactId>guava</artifactId>

​     <groupId>com.google.guava</groupId>

​    </exclusion>

   </exclusions>

  </dependency>

 

  <dependency>

   <groupId>${hive.groupid}</groupId>

   <artifactId>hive-common</artifactId>

   <version>${hive.version}</version>

   <scope>${utilities.bundle.hive.scope}</scope>

   <exclusions>

​    <exclusion>

​     <groupId>org.eclipse.jetty.orbit</groupId>

​     <artifactId>javax.servlet</artifactId>

​    </exclusion>

​    <exclusion>

​     <groupId>org.eclipse.jetty</groupId>

​     <artifactId>*</artifactId>

​    </exclusion>

   </exclusions>

</dependency>

 

  <!-- 增加hudi配置版本的jetty -->

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-server</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-util</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-webapp</artifactId>

   <version>${jetty.version}</version>

  </dependency>

  <dependency>

   <groupId>org.eclipse.jetty</groupId>

   <artifactId>jetty-http</artifactId>

   <version>${jetty.version}</version>

  </dependency>

否则在使用DeltaStreamer工具向hudi表插入数据时,也会报Jetty的错误。

2.2.6 执行编译命令

代码语言:javascript
复制
mvn clean package -DskipTests -Dspark3.2 -Dflink1.13 -Dscala-2.12 -Dhadoop.version=3.1.3 -Pflink-bundle-shade-hive3

2.2.7 编译成功

编译成功后,进入hudi-cli说明成功:

img
img
img
img

编译完成后,相关的包在packaging目录的各个模块中:

img
img

比如,flink与hudi的包:

img
img
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2023-07-25,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 解决spark模块依赖冲突
    • 2.2.6 执行编译命令
      • 2.2.7 编译成功
      相关产品与服务
      大数据
      全栈大数据产品,面向海量数据场景,帮助您 “智理无数,心中有数”!
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档