我有以下PYSPARK数据文件:
+-------------------+----+---------+------+
| timestamplast|ship| X_pos |time_d|
+-------------------+----+---------+------+
|2019-08-01 00:00:00| 1| 3 | 0 |
|2019-08-01 00:00:09| 1| 4 | 9 |
|2019-08-01 00:00:20| 1| 5 | 11 |
|2019-08-01 00:00:27| 1| 9 | 7 |
|2019-08-01 00:00:38| 2| 3 | 0 |
|2019-08-01 00:00:39| 2| 8 | 1 |
|2019-08-01 00:00:57| 2| 20 | 18 |
+-------------------+----+---------+------+如果timest放是日期时间,time_d是组"ship“中的时间差(当新的"ship”启动时,time_d为零。我要计算“船”群内的平均速度,并根据时差和位置X_pos将结果附加到数据中。
ship==1的平均流速为:(1/9 + 1/11 + 4/7)/3 =0.26m/s,ship==2的平均流速为(5/1+ 12/18 /2 = 2.83 m/s )。
编辑: ship==1的平均速度为:(4-3)/(9)+ (5-4)/(11) +(9-5)/(7)/3=0.26m/s,ship==2的平均速度为:(8-3)/1+(20-8/18) /2 = 2.83 m/s。
结果应如下:
+-------------------+----+---------+------+-----------+
| timestamplast|name| X |time_d| avg_vel_x |
+-------------------+----+---------+------+-----------|
|2019-08-01 00:00:00| 1| 3 | 0 | 0.26 |
|2019-08-01 00:00:09| 1| 4 | 9 | 0.26 |
|2019-08-01 00:00:20| 1| 5 | 11 | 0.26 |
|2019-08-01 00:00:27| 1| 9 | 7 | 0.26 |
|2019-08-01 00:00:38| 2| 3 | 0 | 2.83 |
|2019-08-01 00:00:39| 2| 8 | 1 | 2.83 |
|2019-08-01 00:00:57| 2| 20 | 18 | 2.83 |
+-------------------+----+---------+------+-----------|发布于 2020-05-30 17:39:50
熊猫中的transform可以用类似于sql的windows函数复制,您的ship == 1的预期输出应该是0.26。你可以试试:
import pyspark.sql.functions as F
w = Window.partitionBy('ship')
pct_change=((F.col("X_pos")-F.lag("X_pos").over(w.orderBy("timestamplast")))
/F.col("time_d"))
df.withColumn("avg_vel_x",F.round(F.sum(pct_change).over(w)
/(F.count("ship").over(w)-1),2)).show()+-------------------+----+-----+------+---------+
| timestamplast|ship|X_pos|time_d|avg_vel_x|
+-------------------+----+-----+------+---------+
|2019-08-01 00:00:00| 1| 3| 0| 0.26|
|2019-08-01 00:00:09| 1| 4| 9| 0.26|
|2019-08-01 00:00:20| 1| 5| 11| 0.26|
|2019-08-01 00:00:27| 1| 9| 7| 0.26|
|2019-08-01 00:00:38| 2| 3| 0| 2.83|
|2019-08-01 00:00:39| 2| 8| 1| 2.83|
|2019-08-01 00:00:57| 2| 20| 18| 2.83|
+-------------------+----+-----+------+---------+https://stackoverflow.com/questions/62105465
复制相似问题