首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >如果我有多个空格,请使用regexp_replace替换空格

如果我有多个空格,请使用regexp_replace替换空格
EN

Stack Overflow用户
提问于 2019-05-30 19:59:29
回答 2查看 248关注 0票数 0

如果多个列中有空格,如何将空格替换为Null。

代码语言:javascript
复制
Input Dataset which i have
+---+-----++----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|104  |     |
|  1|     |     |
+---+-----+-----+
代码语言:javascript
复制
import org.apache.spark.sql.functions._

val test = df.withColumn("col_1","col_2", regexp_replace(df("col_1","col_1"), "^\\s*", lit(Null)))
test.filter("col_1,col_2 is null").show()

输出数据集:

代码语言:javascript
复制
+---+-----++----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|104  | Null|
|  1|Null | Null|
+---+-----+-----+
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-05-30 20:19:57

每列使用一个withColumn:

代码语言:javascript
复制
import org.apache.spark.sql.functions._
val df = List(("0", "104", "    "), ("1", " ", "")).toDF("Id","col_1", "col_2")

val test = df
  .withColumn("col_1", when(regexp_replace (col("col_1"), "\\s+", "") === "", null).otherwise(col("col_1")))
  .withColumn("col_2", when(regexp_replace (col("col_2"), "\\s+", "") === "", null).otherwise(col("col_2")))
  .show

结果

代码语言:javascript
复制
+---+-----+-----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|  104| null|
|  1| null| null|
+---+-----+-----+
票数 1
EN

Stack Overflow用户

发布于 2019-05-30 20:22:57

您好,您可以这样做:

代码语言:javascript
复制
scala> val someDFWithName = Seq((1, "anurag", ""), (5, "", "")).toDF("id", "name", "age")
someDFWithName: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]

scala> someDFWithName.show
+---+------+---+
| id|  name|age|
+---+------+---+
|  1|anurag|   |
|  5|      |   |
+---+------+---+
scala> someDFWithName.na.replace(Seq("name","age"),Map(""-> null)).show
+---+------+----+
| id|  name| age|
+---+------+----+
|  1|anurag|null|
|  5|  null|null|
+---+------+----+

或者也可以试试这个:

代码语言:javascript
复制
scala> someDFWithName.withColumn("Name", when(col("Name") === "", null).otherwise(col("Name"))).withColumn("Age", when(col("Age") === "", null).otherwise(col("Age"))).show
+---+------+----+
| id|  name| age|
+---+------+----+
|  1|anurag|null|
|  5|  null|null|
+---+------+----+

或者,对于多个空间,请尝试以下内容:

代码语言:javascript
复制
scala> val someDFWithName = Seq(("n", "a"), ( "", "n"), ("         ", ""), ("  ", "a"), ("   ",""), ("        ","   "), ("c"," ")).toDF("name", "place")
someDFWithName: org.apache.spark.sql.DataFrame = [name: string, place: string]

scala> someDFWithName.withColumn("Name", when(regexp_replace(col("name"),"\\s+","") === "", null).otherwise(col("Name"))).withColumn("Place", when(regexp_replace(col("place"),"\\s+","") === "", null).otherwise(col("place"))).show
+----+-----+
|Name|Place|
+----+-----+
|   n|    a|
|null|    n|
|null| null|
|null|    a|
|null| null|
|null| null|
|   c| null|
+----+-----+

我希望这能对你有所帮助。谢谢

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56377905

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档