我用pySpark写的
result = \
df.select('*', date_format('window_start', 'yyyy-MM-dd hh:mm').alias('time_window')) \
.groupby('time_window') \
.agg({'total_score': 'sum'})
result.show()我想让它在scala语言中运行,我做了这个,我得到了,我错了,我没有取消错误,因为scala是新的
val result=df.select('*', date_format(df("time_window"),"yyyy-MM-dd hh:mm").alias("time_window"))
.groupBy("time_window")
.agg(sum("total_score"))错误说
U1, U2org.apache.spark.sql.Dataset(U1, U2)org.apache.spark.sql.DataFrame (cols: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame )不能应用于(Char,org.apache.spark.sql.Column) Process.scala /Process/src第30行Scala问题
如何修复源代码,使其在scala下运行
发布于 2017-05-25 09:10:06
它的工作原理类似于您的pyspark代码。
val data = spark.sparkContext.parallelize(Seq(
("2017-05-21", 1),
("2017-05-21", 1),
("2017-05-22", 1),
("2017-05-22", 1),
("2017-05-23", 1),
("2017-05-23", 1),
("2017-05-23", 1),
("2017-05-23", 1))).toDF("time_window", "foo")
data.withColumn("$time_window", date_format(data("time_window"),"yyyy-MM-dd hh:mm"))
.groupBy("$time_window")
.agg(sum("foo")).showhttps://stackoverflow.com/questions/42487476
复制相似问题