我有一张有曼城和ComplaintType的桌子。
我正在尝试创建一个具有以下计算量的规范化列:
(伪)选择特定城市中特定类型的数量/(特定城市中所有投诉的数量)
我目前有以下SQL:
SELECT City AS city_name, ComplaintType AS complaint_type,
count(*) / (SELECT count(City) FROM data GROUP BY City) AS complaint_frac,
count(*) AS count_freq,
(SELECT count(City) FROM data GROUP BY City) AS count_city
FROM data
GROUP BY City, ComplaintType
ORDER BY complaint_frac DESC
这给了我下表:
城市中的投诉总数(count_city
)是不正确的。但是,当我自己运行count_city查询时,计数是正确的,并给出了以下输出:
如何正确地将city_count
与按城市分列的x
投诉数关联,以便计算正确的分数?
冷硬数例子:
布朗克斯&热水= 79690
布朗克斯(投诉总数)= 579363
complaint_frac = 79690 / 579363 = 0.13754761695
发布于 2019-10-03 19:07:57
将主表中的subquery
关联起来。
SELECT City AS city_name, ComplaintType AS complaint_type,
count(*) / (SELECT count(City) FROM data GROUP BY City) AS complaint_frac,
count(*) AS count_freq,
(SELECT count(d1.City) FROM data d1 WHERE d1.City = d2.City GROUP BY d1.City) AS count_city
FROM data d2
GROUP BY City, ComplaintType
ORDER BY complaint_frac DESC
发布于 2019-10-03 19:50:38
为此您不需要子查询,至少在MySQL 8+中是这样;窗口函数可以完成以下工作:
SELECT City AS city_name, ComplaintType AS complaint_type,
count(*) / sum(count(*)) over (partition by city) as complaint_frac,
count(*) as count_freq,
sum(count(*)) over (partition by city) as count_city
FROM data
GROUP BY City, ComplaintType
ORDER BY complaint_frac DESC
https://stackoverflow.com/questions/58229350
复制相似问题