我正试图根据以下数据框架- salesDF的收入对列“产品”进行排名。
salesDF=
+-------------+-------+---------+----------+-------+
|transactionID|Product| category|produtType|Revenue|
+-------------+-------+---------+----------+-------+
| 105| Lenova| laptop| high| 40000|
| 111| Lenova| tablet| medium| 20000|
| 103| dell| laptop| medum| 25000|
| 107| iphone|cellPhone| small| 70000|
| 113| lenovo|cellPhone| medium| 8000|
| 108| mi|cellPhone| medum| 10000|
下面是iam,使用spark根据收入对每个产品进行排序
rankTheRevenue= salesDF.createTempView("Ranking_DF")
rankProduct= session.sql("select Product, Revenue, rank() over(partion by Product order by Revenue) as Rank_revenue from Ranking_DF")
rankProduct.show()
但我在跟踪错误
pyspark.sql.utils.ParseException:
mismatched input '(' expecting {<EOF>, ',', 'CLUSTER', 'DISTRIBUTE', 'EXCEPT', 'FROM', 'GROUP', 'HAVING', 'INTERSECT', 'LATERAL', 'LIMIT', 'ORDER', 'MINUS', 'SORT', 'UNION', 'WHERE', 'WINDOW', '-'}(line 1, pos 36)
如果有人能帮我解决这类问题,我很感激
谢谢
发布于 2020-05-04 02:02:51
在partition by
子句处有一个类型的作为partion by
。
试着:
rankTheRevenue= salesDF.createTempView("Ranking_DF")
rankProduct= session.sql("select Product, Revenue, rank() over(partition by Product order by Revenue) as Rank_revenue from Ranking_DF")
rankProduct.show()
https://stackoverflow.com/questions/61572145
复制相似问题