问使用concat_ws时排除空列
EN

Stack Overflow用户

提问于 2018-08-21 03:52:59

回答 1查看 282关注 0票数 2

我有一个包含StringType列的数据帧。我需要连接3列，并将结果放在不同的列中。我像这样使用concat_ws：

from pyspark.context import SparkContext
from pyspark.sql import functions as f

def myConcat(*cols):
    return f.trim(f.concat_ws('-', *[f.coalesce(c, f.lit("")) for c in cols]))

df = df.withColumn('Column1', myConcat(df['Column2'], df['Column3'], df['Column4'])).show()

这是预期的结果：

+--------------+-----------+---------+--------+
|Column1       |Column2    |Column3  |Column4 |
+--------------+-----------+---------+--------+
| abcd-efg-hij |   abcd    |      efg|  hij   |
| s675-klm     |   s675    |         |  klm   |
+--------------+-----------+---------+--------+

然而，一些列是空的，当我运行上面的脚本时，我得到了类似于以下内容：

+--------------+-----------+---------+--------+
|Column1       |Column2    |Column3  |Column4 |
+--------------+-----------+---------+--------+
| abcd-efg-hij |   abcd    |      efg|  hij   |
| s675--klm    |   s675    |         |  klm   |
+--------------+-----------+---------+--------+

一个双精度--在第二行结果中。

有没有办法避免这种情况，并得到想要的结果？

apache-spark

pyspark

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-08-21 04:04:36

将字符串列中的空字符串替换为NULL应该有效：

import pyspark.sql.functions as f

def myConcat(*cols):
    return f.trim(f.concat_ws('-', *[f.when(c != '', c) for c in cols]))

df.withColumn('Column1', myConcat(df['Column2'], df['Column3'], df['Column4'])).show()
#+-------+-------+-------+------------+
#|Column2|Column3|Column4|     Column1|
#+-------+-------+-------+------------+
#|   abcd|    efg|    hij|abcd-efg-hij|
#|   s675|       |    klm|    s675-klm|
#+-------+-------+-------+------------+

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51937668

复制

相似问题

问使用concat_ws时排除空列
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用concat_ws时排除空列EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用concat_ws时排除空列
EN