首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >Spark dataset.filter 对中文列名做过滤升级到 3.3.1版本物理解析异常?

Spark dataset.filter 对中文列名做过滤升级到 3.3.1版本物理解析异常?

提问于 2024-07-25 18:25:06
回答 0关注 0查看 9

我有个mysql表,包含中文列名,spark版本从3.0.3 升级到3.3.1之后

如果对中文列做数据过滤,会导致spark sql解析异常,但是在3.0.3中是正常的

且中文列名是包含反引号的

代码语言:txt
复制
spark:3.3.1
dataset.filter(" ( (name = 'name1') ) ")

== Parsed Logical Plan ==
'Filter ('name = name1)
+- Project [人员#541, name#542, 1 AS col1#547]
   +- Project [人员#541, name#542]
      +- Project [cast(人员#537 as string) AS 人员#541, cast(name#538 as string) AS name#542]
         +- Relation [人员#537,name#538] JDBCRelation(`test1111`) [numPartitions=1]

== Analyzed Logical Plan ==
string, name: string, col1: int
Filter (name#542 = name1)
+- Project [人员#541, name#542, 1 AS col1#547]
   +- Project [人员#541, name#542]
      +- Project [cast(人员#537 as string) AS 人员#541, cast(name#538 as string) AS name#542]
         +- Relation [人员#537,name#538] JDBCRelation(`test1111`) [numPartitions=1]

== Optimized Logical Plan ==
Project [人员#537, name#538, 1 AS col1#547]
+- Filter (isnotnull(name#538) AND (name#538 = name1))
   +- Relation [人员#537,name#538] JDBCRelation(`test1111`) [numPartitions=1]

== Physical Plan ==
*(1) Project [人员#537, name#538, 1 AS col1#547]
+- *(1) Scan JDBCRelation(`test1111`) [numPartitions=1] [人员#537,name#538] PushedFilters: [*IsNotNull(name), *EqualTo(name,name1)], ReadSchema: struct<人员:string,name:string>
代码语言:txt
复制
spark:3.3.1
dataset.filter(" ( (`人员` = '111') ) ")

== Parsed Logical Plan ==
'Filter ('人员 = 111)
+- Project [人员#576, name#577, 1 AS col1#582]
   +- Project [人员#576, name#577]
      +- Project [cast(人员#572 as string) AS 人员#576, cast(name#573 as string) AS name#577]
         +- Relation [人员#572,name#573] JDBCRelation(`test1111`) [numPartitions=1]

== Analyzed Logical Plan ==
人员: string, name: string, col1: int
Filter (人员#576 = 111)
+- Project [人员#576, name#577, 1 AS col1#582]
   +- Project [人员#576, name#577]
      +- Project [cast(人员#572 as string) AS 人员#576, cast(name#573 as string) AS name#577]
         +- Relation [人员#572,name#573] JDBCRelation(`test1111`) [numPartitions=1]

== Optimized Logical Plan ==
Project [人员#572, name#573, 1 AS col1#582]
+- Filter (isnotnull(人员#572) AND (人员#572 = 111))
   +- Relation [人员#572,name#573] JDBCRelation(`test1111`) [numPartitions=1]

== Physical Plan ==
org.apache.spark.sql.catalyst.parser.ParseException: 
Syntax error at or near '人'(line 1, pos 0)

== SQL ==
人员
^^^

如果是3.0.3版本,可以正常运行,升级到3.3.1上来之后就不行了

代码语言:txt
复制
spark:3.0.3

== Parsed Logical Plan ==
'Filter (('name = name1) AND ('人员 = 111))
+- Project [人员#74, name#75, 1 AS col1#80]
   +- Project [人员#74, name#75]
      +- Project [cast(人员#70 as string) AS 人员#74, cast(name#71 as string) AS name#75]
         +- Relation[人员#70,name#71] JDBCRelation(`test1111`) [numPartitions=1]

== Analyzed Logical Plan ==
人员: string, name: string, col1: int
Filter ((name#75 = name1) AND (人员#74 = 111))
+- Project [人员#74, name#75, 1 AS col1#80]
   +- Project [人员#74, name#75]
      +- Project [cast(人员#70 as string) AS 人员#74, cast(name#71 as string) AS name#75]
         +- Relation[人员#70,name#71] JDBCRelation(`test1111`) [numPartitions=1]

== Optimized Logical Plan ==
Project [人员#70, name#71, 1 AS col1#80]
+- Filter (((isnotnull(name#71) AND isnotnull(人员#70)) AND (name#71 = name1)) AND (人员#70 = 111))
   +- Relation[人员#70,name#71] JDBCRelation(`test1111`) [numPartitions=1]

== Physical Plan ==
*(1) Project [人员#70, name#71, 1 AS col1#80]
+- *(1) Scan JDBCRelation(`test1111`) [numPartitions=1] [人员#70,name#71] PushedFilters: [*IsNotNull(name), *IsNotNull(人员), *EqualTo(name,name1), *EqualTo(人员,111)], ReadSchema: struct<人员:string,name:string>

请问有人有清楚原因或者解决思路么

回答

和开发者交流更多问题细节吧,去 写回答
相关文章

相似问题

相关问答用户
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档