尝试从'connecting‘表中计算每日接受率,该表有4个字段,带有样本值:
date action sender_id recipient_id
'2017-01-05', 'request_link', 'frank', 'joe'
'2017-01-06', 'request_link', 'sally', 'ann'
'2017-01-07', 'request_link', 'bill', 'ted'
'2017-01-07', 'accept_link', 'joe', 'frank'
'2017-01-06', 'accept_link', 'ann', 'sally'
'2017-01-06', 'accept_link', 'ted', 'bill' 因为01-05有0个accepts,1个request,所以每天的接受率应该是0/1 =0。同样,01-06的比率应为2/1,01-07的比率应为1:1。
但是,重要的是每个accept_link都有一个对应的request_link,其中request_link的sender_id = accept_link的recipient_id (反之亦然)。因此,这里需要一个自连接,我认为这是为了确保Joe接受Frank的请求,而不管日期是什么。
如何更正下面的查询,以便在保留所需连接条件的同时正确执行聚合?如果删除了两个WHERE条件,查询是否会按原样正确计算,或者它们是必需的吗?
SELECT f1.date,
SUM(CASE WHEN f2.action = 'accept_link' THEN 1 ELSE 0 END) /
SUM(CASE WHEN f2.action = 'request_link' THEN 1 ELSE 0 END) AS acceptance_ratio
FROM connecting f1
LEFT JOIN connecting f2
ON f1.sender_id = f2.recipient_id
LEFT JOIN connecting f2
ON f1.recipient_id = f2.sender_id
WHERE f1.action = 'request_link'
AND f2.action = 'accept_link'
GROUP BY f1.date
ORDER BY f1.date ASC预期输出应如下所示:
date acceptance_ratio
'2017-01-05' 0.0000
'2017-01-06' 2.0000
'2017-01-07' 1.0000提前谢谢。
发布于 2017-02-09 16:42:18
再一次,我认为你不需要在这里使用self join。相反,只需对整个表使用条件聚合,并计算每天发生的请求和接受的数量:
SELECT t.date,
CASE WHEN t.num_requests = 0
THEN 'No requests available'
ELSE CAST(t.num_accepts / t.num_requests AS CHAR(50))
END AS acceptance_ratio
FROM
(
SELECT c1.date,
SUM(CASE WHEN c1.action = 'accept_link' AND c2.action IS NOT NULL
THEN 1 ELSE 0 END) AS num_accepts,
SUM(CASE WHEN c1.action = 'request_link' THEN 1 ELSE 0 END) AS num_requests
FROM connecting c1
LEFT JOIN connecting c2
ON c1.action = 'accept_link' AND
c2.action = 'request_link' AND
c1.sender_id = c2.recipient_id AND
c2.recipient_id = c1.sender_id
GROUP BY c1.date
) t
ORDER BY t.date请注意,这里我使用了一个CASE表达式来处理被零除,如果某一天没有请求,就可能发生除零。我在这里还假设相同的邀请不会多次发出。
https://stackoverflow.com/questions/42130476
复制相似问题