描述
Spark SQL 支持多种 JOIN 类型:INNER、LEFT、RIGHT、FULL、CROSS、SEMI、ANTI JOIN。
语法
relation { [ join_type ] JOIN relation [ join_criteria ] | NATURAL join_type JOIN relation }
参数
子句/关键字 | 说明 |
relation | 指定要连接的关系(表/数据集) |
join_type | 指定连接类型 语法: [ INNER ] | CROSS | LEFT [ OUTER ] | [ LEFT ] SEMI | RIGHT [ OUTER ] | FULL [ OUTER ] | [ LEFT ] ANTIINNER:只返回两个关系中都满足连接条件的行。LEFT [OUTER]:返回左关系的所有行,右关系中无匹配的行用 NULL 填充。RIGHT [OUTER]:返回右关系的所有行,左关系中无匹配的行用 NULL 填充。FULL [OUTER]:返回两个关系的所有行,无匹配的一侧用 NULL 填充。CROSS:返回两个关系的笛卡尔积。 |
join_criteria | 指定行组合条件, 语法: ON boolean_expression | USING ( column_name [ , ... ] )boolean_expression:指定一个返回值为布尔类型的表达式 |
示例
-- 创建测试表CREATE EXTERNAL TABLE join_t1 (id INT, name STRING)USING PARQUET LOCATION 'cosn://<your_cos_bucket>/test_join/join_t1';CREATE EXTERNAL TABLE join_t2 (id INT, dept STRING)USING PARQUET LOCATION 'cosn://<your_cos_bucket>/test_join/join_t2';INSERT INTO join_t1 VALUES (1,'Alice'),(2,'Bob'),(3,'Charlie');INSERT INTO join_t2 VALUES (1,'Eng'),(2,'Sales'),(4,'HR');-- INNER JOINSELECT * FROM join_t1 INNER JOIN join_t2 ON join_t1.id = join_t2.id;-- LEFT JOINSELECT * FROM join_t1 LEFT JOIN join_t2 ON join_t1.id = join_t2.id;-- RIGHT JOINSELECT * FROM join_t1 RIGHT JOIN join_t2 ON join_t1.id = join_t2.id;-- FULL OUTER JOINSELECT * FROM join_t1 FULL OUTER JOIN join_t2 ON join_t1.id = join_t2.id;-- CROSS JOINSELECT * FROM join_t1 CROSS JOIN join_t2;-- SEMI JOINSELECT * FROM join_t1 WHERE id IN (SELECT id FROM join_t2);-- ANTI JOINSELECT * FROM join_t1 WHERE NOT EXISTS (SELECT 1 FROM join_t2 WHERE join_t2.id = join_t1.id);-- 多表 JOINSELECT * FROM join_t1 JOIN join_t2 ON join_t1.id = join_t2.idLEFT JOIN join_t1 j3 ON j3.id + 1 = join_t1.id;