前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >SparkSql读取hive表tblproperties异常

SparkSql读取hive表tblproperties异常

作者头像
Fayson
发布2020-03-10 17:53:54
1.8K0
发布2020-03-10 17:53:54
举报
文章被收录于专栏:Hadoop实操Hadoop实操

1

问题描述

集群环境

  • sparksql读取Parquet 格式的hive表报错
  • hive的parquet表,hive和impala读取正常,使用spark-sql读取则报错

异常信息

代码语言:javascript
复制
com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input within/between Object entries
at [Source: (String)"{"type":"struct","fields":[{"name":"timestamp","type":"string","nullable":true,"metadata":{"HIVE_TYPE_STRING":"string"}},{"name":"xxx","type":"string","nullable":true,"metadata":{"HIVE_TYPE_STRING":"string"}},{"name":"xxx","type":"string","nullable":true,"; line: 1, column: 513]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipAfterComma2(ReaderBasedJsonParser.java:2323)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipComma(ReaderBasedJsonParser.java:2293)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:664)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:47)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1611)
at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1219)
at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:25)
at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:55)
at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:127)
at org.apache.spark.sql.hive.HiveExternalCatalog$.org$apache$spark$sql$hive$HiveExternalCatalog$$getSchemaFromTableProperties(HiveExternalCatalog.scala:1382)
at org.apache.spark.sql.hive.HiveExternalCatalog.restoreDataSourceTable(HiveExternalCatalog.scala:845)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$restoreTableMetadata(HiveExternalCatalog.scala:765)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:734)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:734)

2

问题原因

从报错来看,该hive表的tblproperites有问题,tblproperites中的json字段无法正常解析,导致SparkSql读取该表出错。Hive和Impala在读取表的时候不会去解析tblproperites,因此正常。

3

问题解决

  • tblproperites不全的问题,应该是hive存储tblproperites的表,参数字段存在截断,因此找到metastore库中的TABLE_PARAMS表,检查PARAM_VALUE字段,发现该字段的长度仅为256,找到问题
  • 将PARAM_VALUE的长度修改为8000,问题解决
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-02-28,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Hadoop实操 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档