在Apache Spark中强制转换DecimalType(10,5
),例如将99999.99999
转换为DecimalType(
5,4)静默返回null
有没有可能改变这种行为,允许Spark在这种情况下抛出一个异常(例如一些CastException),并使作业失败,而不是静默地返回null
?
发布于 2019-04-15 14:03:35
/** *将给定小数的精度/小数位数更改为
decimalType
中设置的精度/小数位数(如果有),*如果它溢出则返回null,或者就地修改value
并在成功时返回它。**注意:这会就地修改value
,所以不要在外部数据上调用它。*/
还有另一个线程,表明如果不能强制转换,可能没有直接的方法使代码失败。Spark: cast decimal without changing nullable property of column。因此,您可能可以尝试检查强制转换列中的null
值,并创建一个逻辑,使其在存在的情况下失败?
发布于 2020-01-09 12:47:45
正如我在上面的评论中提到的,您可以尝试使用UserDefinedFunction来实现您想要的功能。我现在也面临着同样的问题,但我还是设法用UDF解决了我的问题。我面临的问题是,我想尝试将列转换为DoubleType,但我不知道预先的类型,而且我不希望我的应用程序在解析失败时失败,所以不是像你说的那样静默的'null‘。
在下面的代码中,您可以看到我编写了一个udf
,它接受struct
作为参数。我将尝试将此结构中唯一的值解析为double。如果这失败了,我将抛出一个异常,导致我的作业失败。
import spark.implicits._
val cast_to_double = udf((number: Row) => {
try {
number.get(0) match {
case s: String => s.toDouble
case d: Double => d
case l: Long => l.toDouble
case i: Int => i.toDouble
case _ => throw new NumberFormatException
}
} catch {
case _: NumberFormatException => throw new IllegalArgumentException("Can't parse this so called number of yours.")
}
})
try {
val intDF = List(1).toDF("something")
val secondIntDF = intDF.withColumn("something_else", cast_to_double(struct(col("something"))))
secondIntDF.printSchema()
secondIntDF.show()
val stringIntDF = List("1").toDF("something")
val secondStringIntDF = stringIntDF.withColumn("something_else", cast_to_double(struct(col("something"))))
secondStringIntDF.printSchema()
secondStringIntDF.show()
val stringDF = List("string").toDF("something")
val secondStringDF = stringDF.withColumn("something_else", cast_to_double(struct(col("something"))))
secondStringDF.printSchema()
secondStringDF.show()
} catch {
case se: SparkException => println(se.getCause.getMessage)
}
输出:
root
|-- something: integer (nullable = false)
|-- something_else: double (nullable = false)
+---------+--------------+
|something|something_else|
+---------+--------------+
| 1| 1.0|
+---------+--------------+
root
|-- something: string (nullable = true)
|-- something_else: double (nullable = false)
+---------+--------------+
|something|something_else|
+---------+--------------+
| 1| 1.0|
+---------+--------------+
root
|-- something: string (nullable = true)
|-- something_else: double (nullable = false)
Can't parse this so called number of yours.
https://stackoverflow.com/questions/55688810
复制