如果多个列中有空格,如何将空格替换为Null。
Input Dataset which i have
+---+-----++----+
| Id|col_1|col_2|
+---+-----+-----+
| 0|104 | |
| 1| | |
+---+-----+-----+
import org.apache.spark.sql.functions._
val test = df.withColumn("col_1","col_2", regexp_replace(df("col_1","col_1"), "^\\s*", lit(Null)))
test.filter("col_1,col_2 is null").show()
输出数据集:
+---+-----++----+
| Id|col_1|col_2|
+---+-----+-----+
| 0|104 | Null|
| 1|Null | Null|
+---+-----+-----+
发布于 2019-05-30 20:19:57
每列使用一个withColumn:
import org.apache.spark.sql.functions._
val df = List(("0", "104", " "), ("1", " ", "")).toDF("Id","col_1", "col_2")
val test = df
.withColumn("col_1", when(regexp_replace (col("col_1"), "\\s+", "") === "", null).otherwise(col("col_1")))
.withColumn("col_2", when(regexp_replace (col("col_2"), "\\s+", "") === "", null).otherwise(col("col_2")))
.show
结果
+---+-----+-----+
| Id|col_1|col_2|
+---+-----+-----+
| 0| 104| null|
| 1| null| null|
+---+-----+-----+
发布于 2019-05-30 20:22:57
您好,您可以这样做:
scala> val someDFWithName = Seq((1, "anurag", ""), (5, "", "")).toDF("id", "name", "age")
someDFWithName: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
scala> someDFWithName.show
+---+------+---+
| id| name|age|
+---+------+---+
| 1|anurag| |
| 5| | |
+---+------+---+
scala> someDFWithName.na.replace(Seq("name","age"),Map(""-> null)).show
+---+------+----+
| id| name| age|
+---+------+----+
| 1|anurag|null|
| 5| null|null|
+---+------+----+
或者也可以试试这个:
scala> someDFWithName.withColumn("Name", when(col("Name") === "", null).otherwise(col("Name"))).withColumn("Age", when(col("Age") === "", null).otherwise(col("Age"))).show
+---+------+----+
| id| name| age|
+---+------+----+
| 1|anurag|null|
| 5| null|null|
+---+------+----+
或者,对于多个空间,请尝试以下内容:
scala> val someDFWithName = Seq(("n", "a"), ( "", "n"), (" ", ""), (" ", "a"), (" ",""), (" "," "), ("c"," ")).toDF("name", "place")
someDFWithName: org.apache.spark.sql.DataFrame = [name: string, place: string]
scala> someDFWithName.withColumn("Name", when(regexp_replace(col("name"),"\\s+","") === "", null).otherwise(col("Name"))).withColumn("Place", when(regexp_replace(col("place"),"\\s+","") === "", null).otherwise(col("place"))).show
+----+-----+
|Name|Place|
+----+-----+
| n| a|
|null| n|
|null| null|
|null| a|
|null| null|
|null| null|
| c| null|
+----+-----+
我希望这能对你有所帮助。谢谢
https://stackoverflow.com/questions/56377905
复制相似问题