JOIN

最近更新时间:2026-05-20 14:11:22

我的收藏

描述

Spark SQL 支持多种 JOIN 类型:INNER、LEFT、RIGHT、FULL、CROSS、SEMI、ANTI JOIN。

语法

relation { [ join_type ] JOIN relation [ join_criteria ] | NATURAL join_type JOIN relation }

参数

子句/关键字
说明
relation
指定要连接的关系(表/数据集)
join_type
指定连接类型
语法:[ INNER ] | CROSS | LEFT [ OUTER ] | [ LEFT ] SEMI | RIGHT [ OUTER ] | FULL [ OUTER ] | [ LEFT ] ANTI
INNER:只返回两个关系中都满足连接条件的行。
LEFT [OUTER]:返回左关系的所有行,右关系中无匹配的行用 NULL 填充。
RIGHT [OUTER]:返回右关系的所有行,左关系中无匹配的行用 NULL 填充。
FULL [OUTER]:返回两个关系的所有行,无匹配的一侧用 NULL 填充。
CROSS:返回两个关系的笛卡尔积。
join_criteria
指定行组合条件,
语法:​ON boolean_expression | USING ( column_name [ , ... ] )
boolean_expression:指定一个返回值为布尔类型的表达式

示例

-- 创建测试表
CREATE EXTERNAL TABLE join_t1 (id INT, name STRING)
USING PARQUET LOCATION 'cosn://<your_cos_bucket>/test_join/join_t1';
CREATE EXTERNAL TABLE join_t2 (id INT, dept STRING)
USING PARQUET LOCATION 'cosn://<your_cos_bucket>/test_join/join_t2';
INSERT INTO join_t1 VALUES (1,'Alice'),(2,'Bob'),(3,'Charlie');
INSERT INTO join_t2 VALUES (1,'Eng'),(2,'Sales'),(4,'HR');

-- INNER JOIN
SELECT * FROM join_t1 INNER JOIN join_t2 ON join_t1.id = join_t2.id;

-- LEFT JOIN
SELECT * FROM join_t1 LEFT JOIN join_t2 ON join_t1.id = join_t2.id;

-- RIGHT JOIN
SELECT * FROM join_t1 RIGHT JOIN join_t2 ON join_t1.id = join_t2.id;

-- FULL OUTER JOIN
SELECT * FROM join_t1 FULL OUTER JOIN join_t2 ON join_t1.id = join_t2.id;

-- CROSS JOIN
SELECT * FROM join_t1 CROSS JOIN join_t2;

-- SEMI JOIN
SELECT * FROM join_t1 WHERE id IN (SELECT id FROM join_t2);

-- ANTI JOIN
SELECT * FROM join_t1 WHERE NOT EXISTS (SELECT 1 FROM join_t2 WHERE join_t2.id = join_t1.id);

-- 多表 JOIN
SELECT * FROM join_t1 JOIN join_t2 ON join_t1.id = join_t2.id
LEFT JOIN join_t1 j3 ON j3.id + 1 = join_t1.id;