我想将两个数据格式a
和b
组合成一个在列上排序的数据格式c
。
val a = Seq(("a", 1), ("c", 2), ("e", 3)).toDF("char", "num")
val b = Seq(("b", 4), ("d", 5)).toDF("char", "num")
val c = // how do I sort on char column?
以下是我想要的结果:
a.show() b.show() c.show()
+----+---+ +----+---+ +----+---+
|char|num| |char|num| |char|num|
+----+---+ +----+---+ +----+---+
| a| 1| | b| 4| | a| 1|
| c| 2| | d| 5| | b| 4|
| e| 3| +----+---+ | c| 2|
+----+---+ | d| 5|
| e| 3|
+----+---+
发布于 2017-03-15 06:15:45
简单地说,您可以对每个数据文件和sort()
使用sort()
。
val a = Seq(("a", 1), ("c", 2), ("e", 3)).toDF("char", "num").sort($"char")
val b = Seq(("b", 4), ("d", 5)).toDF("char", "num").sort($"char")
val c = a.union(b).sort($"char")
发布于 2018-05-22 09:59:08
如果您想要对多个数据文件进行合并,我们可以尝试这样做。
val df1 = sc.parallelize(List(
(50, 2, "arjun"),
(34, 4, "bob")
)).toDF("age", "children","name")
val df2 = sc.parallelize(List(
(51, 3, "jane"),
(35, 5, "bob")
)).toDF("age", "children","name")
val df3 = sc.parallelize(List(
(50, 2,"arjun"),
(34, 4,"bob")
)).toDF("age", "children","name")
val result= Seq(df1, df2, df3)
val res_union=result.reduce(_ union _).sort($"age",$"name",$"children")
res_union.show()
https://stackoverflow.com/questions/42798697
复制相似问题