窗口函数

最近更新时间:2026-05-06 16:28:13

我的收藏
说明:
适用版本:TCHouse-X 内核版本 2.0.0及以上版本。
窗口函数在与当前行相关的行集合(即“窗口”)上执行计算。与聚合函数不同,窗口函数不会将多行压缩为一行,而是为每一行保留其原始身份并附带计算结果。

核心语法

所有窗口函数都必须配合 OVER 子句使用:
function(args) OVER (
[PARTITION BY expression]
[ORDER BY expression [ASC|DESC]]
[frame]
)

关键概念

PARTITION BY:将数据集划分为多个分区。计算在每个分区内独立进行。若省略,则整个结果集视为一个分区。
ORDER BY:定义分区内行的排列顺序。这对于排名函数和涉及物理范围(如累加)的计算至关重要。
frame: 指定函数在处理特定输入行时所涵盖的滑动行窗口。
范围可以是 ROWS(基于行数)或 RANGE(基于逻辑值范围),从 frame_start 延伸到 frame_end
如果未指定 frame_end,则默认使用 CURRENT ROW(当前行)。

常用窗口函数

排名与编号函数

函数
排序方式
特点
示例结果 (1, 1, 2)
row_number()
唯一编号
即使值相同,编号也不重复。
1, 2, 3
rank()
跳跃排名
值相同时排名并列,但会“跳过”后续编号。
1, 1, 3
dense_rank()
连续排名
值相同时排名并列,且编号保持连续。
1, 1, 2
ntile(n)
切片/分桶
将数据均匀分成 n 份,返回所属桶号。
1, 1, 2 (n=2)
WITH sales_data AS (
SELECT 'Alice' AS name, 'Sales' AS dept, 5000 AS amount UNION ALL
SELECT 'Bob' AS name, 'Sales' AS dept, 5000 AS amount UNION ALL
SELECT 'Chris' AS name, 'Sales' AS dept, 4000 AS amount UNION ALL
SELECT 'David' AS name, 'Sales' AS dept, 3000 AS amount UNION ALL
SELECT 'Eve' AS name, 'Sales' AS dept, 2000 AS amount
)
SELECT
name,
amount,
row_number() OVER (ORDER BY amount DESC) AS row_num,
rank() OVER (ORDER BY amount DESC) AS rk,
dense_rank() OVER (ORDER BY amount DESC) AS d_rk,
ntile(2) OVER (ORDER BY amount DESC) AS bucket
FROM sales_data;

-- 结果:
-- name,amount,row_num,rk,d_rk,bucket
-- Alice,5000,1,1,1,1
-- Bob,5000,2,1,1,1
-- Chris,4000,3,3,2,1
-- David,3000,4,3,2
-- Eve,2000,5,5,4,2

偏移与取值函数

这类函数用于跨行获取数据,常用于计算同比、环比或寻找临界点。
函数
功能描述
核心参数
lag(x, o, d)
向下偏移。获取当前行之前的第 o 个值。
o: 偏移量, d: 默认值
lead(x, o, d)
向上偏移。获取当前行之后的第 o 个值。
o: 偏移量, d: 默认值
first_value(x)
返回窗口内的第一个值。
常配合 ORDER BY 使用
last_value(x)
返回窗口内的最后一个值。
常配合 ORDER BY 使用
nth_value(x, n)
返回窗口内第 n 个指定值。
常配合 ORDER BY 使用
-- 观察用户每日登录次数的变化(环比增长)。
WITH daily_logins AS (
SELECT '2026-01-01' AS `date`, 100 AS cnt UNION ALL
SELECT '2026-01-02' AS `date`, 120 AS cnt UNION ALL
SELECT '2026-01-03' AS `date`, 110 AS cnt UNION ALL
SELECT '2026-01-04' AS `date`, 150 AS cnt
)
SELECT
`date`,
cnt AS "当日登录数",
-- lag(x, o, d): 获取前 1 天的次数,若无则显示 0
lag(cnt, 1, 0) OVER (ORDER BY `date`) AS "前日登录数",
-- lead(x, o, d): 获取后 1 天的次数,若无则显示 0
lead(cnt, 1, 0) OVER (ORDER BY `date`) AS "次日登录数"
FROM daily_logins;

-- 结果:
-- date,当日登录数,前日登录数,次日登录数
-- 2026-01-01,100,0,120
-- 2026-01-02,120,100,110
-- 2026-01-03,110,120,150
-- 2026-01-04,150,110,0

-- 订单列表中,找出每个用户的“首次下单金额”和“最近一次下单金额”。
WITH orders AS (
SELECT 'User_A' AS user_id, '2026-01-01' AS order_date, 100 AS price UNION ALL
SELECT 'User_A' AS user_id, '2026-01-05' AS order_date, 250 AS price UNION ALL
SELECT 'User_A' AS user_id, '2026-01-10' AS order_date, 300 AS price
)
SELECT
user_id,
order_date,
price,
-- 获取该用户的第一笔订单金额
first_value(price) OVER (PARTITION BY user_id ORDER BY order_date) AS "首单金额",
-- 获取该用户的最新一笔订单金额
last_value(price) OVER (
PARTITION BY user_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS "终单金额"
FROM orders;

-- 结果:
-- user_id,order_date,price,首单金额,终单金额
-- User_A,2026-01-01,100,100,300
-- User_A,2026-01-05,250,100,300
-- User_A,2026-01-10,300,100,300


-- 获取每个部门薪资排名第二的员工。
WITH salary_data AS (
SELECT 'Tech' AS dept, 'Alice' AS name, 15000 AS salary UNION ALL
SELECT 'Tech' AS dept, 'Bob' AS name, 12000 AS salary UNION ALL
SELECT 'Tech' AS dept, 'Chris' AS name, 10000 AS salary
)
SELECT
dept,
name,
salary,
-- nth_value(x, n): 获取按薪资降序排列后的第 2 名
nth_value(name, 2) OVER (
PARTITION BY dept
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS "部门榜眼"
FROM salary_data;

-- 结果:
-- dept,name,salary,部门榜眼
-- Tech,Alice,15000,Bob
-- Tech,Bob,12000,Bob
-- Tech,Chris,10000,Bob

分布与统计函数

这类函数用于分析数据在整体中的位置百分比。
函数
计算公式
典型用途
percent_rank()
(rank - 1) / (总行数 - 1)
计算当前行在全集中的相对百分比排名。
cume_dist()
小于等于当前值的行数 / 总行数
计算累积分布,常用于帕累托分析。
WITH score_data AS (
SELECT 'Alice' AS name, 95 AS score UNION ALL
SELECT 'Bob' AS name, 95 AS score UNION ALL
SELECT 'Chris' AS name, 80 AS score UNION ALL
SELECT 'David' AS name, 60 AS score
)
SELECT
name,
score,
-- percent_rank(): (rank - 1) / (total_rows - 1)
percent_rank() OVER (ORDER BY score DESC) AS p_rank,
-- cume_dist(): 小于等于当前值的行数 / 总行数
cume_dist() OVER (ORDER BY score DESC) AS c_dist
FROM score_data;

-- 结果:
-- name,score,p_rank,c_dist
-- Alice,95,0,0.5
-- Bob,95,0,0.5
-- Chris,80,0.66666666666666663,0.75
-- David,60,1,1