首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >带有列when条件的spark数据帧

带有列when条件的spark数据帧
EN

Stack Overflow用户
提问于 2020-04-23 04:26:24
回答 1查看 189关注 0票数 0

我的要求如下

连接两个数据帧,如下所示:

代码语言:javascript
运行
复制
     var c = a.join(b,keys,"fullouter")

c.printSchema()如下:

代码语言:javascript
运行
复制
     |-- add: string (nullable = true)
     |-- sub: string (nullable = true)
     |-- delete: string (nullable = true)
     |-- mul: long (nullable = true)
     |-- ADD: string (nullable = true)
     |-- SUB: string (nullable = true)
     |-- DELETE: string (nullable = true)
     |-- MUL: long (nullable = true)
      It's good until here.

现在我正在执行一个when列when条件,如下所示

代码语言:javascript
运行
复制
     val d = c.withColumn("column", when(c("a.add") === c("b.ADD"), 
   "Neardata"))

错误信息如下:

代码语言:javascript
运行
复制
    Exception in thread "main" org.apache.spark.sql.AnalysisException: 
    Cannot resolve column name "a.add"

我也试过了,如下

代码语言:javascript
运行
复制
     val d = c.withColumn("column", when(col("a.add") === col("b.ADD"), "Neardata"))

    Again error.

   Please suggest.
EN

Stack Overflow用户

回答已采纳

发布于 2020-04-23 04:54:02

您必须使用datframe.as("a")和dataframe1.as("b")定义别名。

示例:

代码语言:javascript
运行
复制
  import spark.sqlContext.implicits._
  val data = List(("James","","Smith","36636","M",60000),
    ("Michael","Rose","","40288","M",70000),
    ("Robert","","Williams","42114","",400000),
    ("Maria","Anne","Jones","39192","F",500000),
    ("Jen","Mary","Brown","","F",0))

  val cols = Seq("first_name","middle_name","last_name","dob","gender","salary")
  val df = spark.createDataFrame(data).toDF(cols:_*).as("a")
  val df2 = df.withColumn("a.new_gender", when(col("a.gender") === "M","Male")
    .when(col("a.gender") === "F","Female")
    .otherwise("Unknown")).show

输出:

代码语言:javascript
运行
复制
+----------+-----------+---------+-----+------+------+------------+
|first_name|middle_name|last_name|  dob|gender|salary|a.new_gender|
+----------+-----------+---------+-----+------+------+------------+
|     James|           |    Smith|36636|     M| 60000|        Male|
|   Michael|       Rose|         |40288|     M| 70000|        Male|
|    Robert|           | Williams|42114|      |400000|     Unknown|
|     Maria|       Anne|    Jones|39192|     F|500000|      Female|
|       Jen|       Mary|    Brown|     |     F|     0|      Female|
+----------+-----------+---------+-----+------+------+------------+

我认为如果没有别名,你就可以像这样访问...这可能就是原因。

代码语言:javascript
运行
复制
  val df2 = df.withColumn("df.new_gender", when(col("df.gender") === "M","Male")
    .when(col("df.gender") === "F","Female")
    .otherwise("Unknown")).show
票数 1
EN
查看全部 1 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61374524

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档