文章/答案/技术大牛

发布

问星火飞轮
EN

Stack Overflow用户

提问于 2020-03-03 14:41:54

回答 2查看 291关注 0票数 2

我有这样的df。

+---+-----+-----+----+
|  M|M_Max|Sales|Rank|
+---+-----+-----+----+
| M1|  100|  200|   1|
| M1|  100|  175|   2|
| M1|  101|  150|   3|
| M1|  100|  125|   4|
| M1|  100|   90|   5|
| M1|  100|   85|   6|
| M2|  200| 1001|   1|
| M2|  200|  500|   2|
| M2|  201|  456|   3|
| M2|  200|  345|   4|
| M2|  200|  231|   5|
| M2|  200|  123|   6|
+---+-----+-----+----+

我在这个df上面做一个枢轴操作，就像这样。

df.groupBy("M").pivot("Rank").agg(first("Sales")).show
+---+----+---+---+---+---+---+
|  M|   1|  2|  3|  4|  5|  6|
+---+----+---+---+---+---+---+
| M1| 200|175|150|125| 90| 85|
| M2|1001|500|456|345|231|123|
+---+----+---+---+---+---+---+

但我的预期产出如下。这意味着我需要得到输出中的列Max(M_Max)。

这里，M_Max是列- M_Max的最大值。我的预期产出如下所示。在不使用df联接的情况下，Pivot函数可以这样做吗？

+---+----+---+---+---+---+---+-----+
|  M|   1|  2|  3|  4|  5|  6|M_Max|
+---+----+---+---+---+---+---+-----+
| M1| 200|175|150|125| 90| 85|  101|
| M2|1001|500|456|345|231|123|  201|
+---+----+---+---+---+---+---+-----+

scala

apache-spark

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-03-03 15:19:12

诀窍是应用窗口函数。解决办法如下：

scala> val df = Seq(
     |      | ("M1",100,200,1),
     |      | ("M1",100,175,2),
     |      | ("M1",101,150,3),
     |      | ("M1",100,125,4),
     |      | ("M1",100,90,5),
     |      | ("M1",100,85,6),
     |      | ("M2",200,1001,1),
     |      | ("M2",200,500,2),
     |      | ("M2",200,456,3),
     |      | ("M2",200,345,4),
     |      | ("M2",200,231,5),
     |      | ("M2",201,123,6)
     |      | ).toDF("M","M_Max","Sales","Rank")
df: org.apache.spark.sql.DataFrame = [M: string, M_Max: int ... 2 more fields]

scala> import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.expressions.Window

scala> val w = Window.partitionBy("M")
w: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@49b4e11c

scala> df.withColumn("new", max("M_Max") over (w)).groupBy("M", "new").pivot("Rank").agg(first("Sales")).withColumnRenamed("new", "M_Max").show
+---+-----+----+---+---+---+---+---+
|  M|M_Max|   1|  2|  3|  4|  5|  6|
+---+-----+----+---+---+---+---+---+
| M1|  101| 200|175|150|125| 90| 85|
| M2|  201|1001|500|456|345|231|123|
+---+-----+----+---+---+---+---+---+


scala> df.show
+---+-----+-----+----+
|  M|M_Max|Sales|Rank|
+---+-----+-----+----+
| M1|  100|  200|   1|
| M1|  100|  175|   2|
| M1|  101|  150|   3|
| M1|  100|  125|   4|
| M1|  100|   90|   5|
| M1|  100|   85|   6|
| M2|  200| 1001|   1|
| M2|  200|  500|   2|
| M2|  200|  456|   3|
| M2|  200|  345|   4|
| M2|  200|  231|   5|
| M2|  201|  123|   6|
+---+-----+-----+----+

如果有帮助请告诉我！！

票数 3

Stack Overflow用户

发布于 2020-03-05 06:18:44

基本上，我看到了三种可能的方法。

单独和使用answer.

Computing (您想要避免的)计算M_Max的最大值。

使用另一个M_Max中建议的窗口--带有支点的最大值，并使用array_max.

聚合生成的列。

最有可能的是，方法1的效率较低。然而，在2到3之间，我不确定。你可以试着用你的数据告诉我们-)

办法3如下：

val df = Seq(
    ("M1", 100, 200, 1),  ("M1", 100, 175, 2), ("M1", 101, 150, 3),
    ("M1", 100, 125, 4),  ("M1", 100, 90, 5),  ("M1", 100, 85, 6),
    ("M2", 200, 1001, 1), ("M2", 200, 500, 2), ("M2", 200, 456, 3),
    ("M2", 200, 345, 4),  ("M2", 200, 231, 5), ("M2", 201, 123, 6)
).toDF("M","M_Max","Sales","Rank")

// we include the max in the pivot, so we have one max column per rank
val df_pivot = df
    .groupBy("M").pivot("Rank")
    .agg(first('Sales) as "first", max('M_Max) as "max")

val max_cols = df_pivot.columns.filter(_ endsWith "max").map(col)
// then we aggregate these max columns into one
val max_col = array_max(array(max_cols : _*)) as "M_Max"

// let's rename the first columns to match your expected output
val first_cols = df_pivot.columns.filter(_ endsWith "first")
    .map(name => col(name) as name.split("_")(0))

// And finally, we wrap everything together
df_pivot
    .select($"M" +: first_cols :+ max_col : _*)
    .show(false)

产额

+---+----+---+---+---+---+---+-----+
|M  |1   |2  |3  |4  |5  |6  |M_Max|
+---+----+---+---+---+---+---+-----+
|M1 |200 |175|150|125|90 |85 |101  |
|M2 |1001|500|456|345|231|123|201  |
+---+----+---+---+---+---+---+-----+

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60509684

复制

相似问题

问星火飞轮
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问星火飞轮EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问星火飞轮
EN