你好,我是新来的火花,我有两个数据帧,这样:
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Region| 3/7/20| 3/8/20| 3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Paris| 0| 0| 0| 1| 7| 0| 5|
+--------------+-------+-------+-------+-------+-------+-------+-------+
+----------+-------+
| Period|Reports|
+----------+-------+
|2020/07/20| 0|
|2020/07/21| 0|
|2020/07/22| 0|
|2020/07/23| 8|
|2020/07/24| 0|
|2020/07/25| 1|
+----------+-------+
如何删除第一个0值连续列3/7/20,3/8/20,3/9/20,而不删除第3/12/20栏?同样,对于第二个数据,如何删除第3/12/20、0和2020/07/21、0和2020/07/22、0和2020/07/22行,而不删除2020/07/22、0行
发布于 2020-08-10 08:30:46
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
val df=Seq(("0","0","0","1","7","0","5")).toDF("3/7/20","3/8/20","3/9/20","3/10/20","3/11/20","3/12/20","3/13/20")
var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) }
df.printSchema()
val df1 = df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap"))
.toDF("Region","Paris")
val windowSpec = Window.partitionBy(lit("A")).orderBy(lit("A"))
df1.withColumn("row_number",row_number.over(windowSpec))
.withColumn("lag", lag("Paris", 1, 0).over(windowSpec))
.withColumn("lead", lead("Paris", 1, 0)
.over(windowSpec)).where(($"lag">0) or ($"Paris"> 0)).show()
/*
+-------+-----+----------+---+----+
| Region|Paris|row_number|lag|lead|
+-------+-----+----------+---+----+
|3/10/20| 1| 4| 0| 7|
|3/11/20| 7| 5| 1| 0|
|3/12/20| 0| 6| 7| 5|
|3/13/20| 5| 7| 0| 0|
+-------+-----+----------+---+----+
*/
val df2=Seq(("2020/07/20","0"),("2020/07/21","0"),("2020/07/22","0"),("2020/07/23","8"),("2020/07/24","0"),("2020/07/25","1")).toDF("Period","Reports")
df2.withColumn("row_number",row_number.over(windowSpec))
.withColumn("lag", lag("Reports", 1, 0).over(windowSpec))
.withColumn("lead", lead("Reports", 1, 0).over(windowSpec))
.where((($"lag">0) or ($"Reports"> 0)) and ($"row_number">1)).show()
/*
+----------+-------+----------+---+----+
| Period|Reports|row_number|lag|lead|
+----------+-------+----------+---+----+
|2020/07/23| 8| 4| 0| 0|
|2020/07/24| 0| 5| 8| 1|
|2020/07/25| 1| 6| 0| 0|
+----------+-------+----------+---+----+
*/
https://stackoverflow.com/questions/63342587
复制相似问题