现有关注表 t_follow,包含from_user:关注者,to_user:被关注者。如果两个用户互相关注,则代表他们是好友,请找出是好友的关系对;
数据
+------------+----------+
| from_user | to_user |
+------------+----------+
| A | B |
| A | C |
| A | D |
| B | A |
| B | E |
| C | A |
+------------+----------+
本题属于很长见,历史也很悠久的题目了,在非大数据时代就开始考,着重考察的是在大数据量的时候该如何解决。今天给出三种解决方案,分别为常规解法、union all解法,排序解法;
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️ |
所谓常规解法,即想要查询是否互相关注,使用表自关联,让from_user和to_user交叉相等即可。
select
t1.from_user,
t1.to_user
from
t_follow t1
join t_follow t2
on t1.from_user=t2.to_user
and t1.to_user = t2. from_user;
执行结果
+---------------+-------------+
| t1.from_user | t1.to_user |
+---------------+-------------+
| B | A |
| C | A |
| A | B |
| A | C |
+---------------+-------------+
为了避免使用join,尤其是在大量数据场景下join消耗资源较大,所以出现了使用union all的方式。首先查询一遍t_follow表,然后将from_user 和to_user 互换在查询一遍。如果两个用户互相关注,则会出现两条相同的记录。只要找到数据行数=2的记录即为互相关注的用户。
执行SQL
select
from_user,
to_user,
count(1)
from
(
select
from_user,
to_user
from t_follow
union all
select
to_user,
from_user
from t_follow
) t
group by from_user,to_user
having count(1)= 2
执行结果
+------------+----------+------+
| from_user | to_user | _c2 |
+------------+----------+------+
| A | B | 2 |
| A | C | 2 |
| B | A | 2 |
| C | A | 2 |
+------------+----------+------+
因为只有两列,如果A关注了B,B也关注了A,新生成一列,拼接from_user和to_user,但是要求两个的是有序的,则得到相同的相同的值。我们根据相同的值进行分组统计,就能得到互相关注的组。
拼接SQL
select
from_user,
to_user,
if(from_user<to_user, concat_ws('-',from_user,to_user),concat_ws('-',to_user,from_user)) as new_str
from t_follow
执行结果
+------------+----------+----------+
| from_user | to_user | new_str |
+------------+----------+----------+
| A | B | A-B |
| A | C | A-C |
| A | D | A-D |
| B | A | A-B |
| B | E | B-E |
| C | A | A-C |
+------------+----------+----------+
可以看到new_str中,A关注B和B关注A都是A-B
查询结果SQL
select
if(from_user<to_user, concat_ws('-',from_user,to_user),concat_ws('-',to_user,from_user)) as new_str,
count(1)
from t_follow
group by if(from_user<to_user, concat_ws('-',from_user,to_user),concat_ws('-',to_user,from_user))
having count(1) =2;
执行结果
+----------+------+
| new_str | _c1 |
+----------+------+
| A-B | 2 |
| A-C | 2 |
+----------+------+
CREATE TABLE IF NOT EXISTS t_follow (
from_user STRING, --关注者
to_user STRING --被关注者
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC
insert into t_follow(from_user, to_user) values
('A','B'),
('A','C'),
('A','D'),
('B','A'),
('B','E'),
('C','A')