PostgreSQL 同一种SQL为什么这样写会提升45%性能 --程序员和DBA思维方式不同决定

AustinDatabases

发布于 2024-05-20 15:13:06

510

发布于 2024-05-20 15:13:06

SQL语句，简单的大家都会写，但如果是复杂的SQL语句，撰写起来很多程序员就不会了，甚至一些DBA也不会写，今天我们来模拟一些个别的例子来看看一些复杂的SQL语句怎么写。同时最后我们演示一下复杂的SQL 程序员和 DBA 在专项的思维模式不同导致的运行效率不同的案例。（想要看程序员和DBA的SQL撰写的比较直接到练习3）

我们先循序渐进，热身一下，在开始写SQL之前我们，需要测试数据, 我们用三个简单的表以及相关的数据来去演示一些复杂的SQL怎么写。下面我们创建一些表，以及一些数据方便我们开始撰写一些比较复杂的SQL

CREATE TABLE students (
    student_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    age INTEGER NOT NULL,
    gender VARCHAR(10) NOT NULL
);

CREATE TABLE subjects (
    subject_id SERIAL PRIMARY KEY,
    name VARCHAR(50) NOT NULL
);


CREATE TABLE scores (
    score_id SERIAL PRIMARY KEY,
    student_id INTEGER,
    subject_id INTEGER,
    score INTEGER NOT NULL
);

INSERT INTO students (student_id, name, age, gender) VALUES
    (1, 'John', 20, 'Male'),
    (2, 'Alice', 22, 'Female'),
    (3, 'Bob', 21, 'Male'),
    (4, 'Emily', 23, 'Female');

-- 插入科目数据
INSERT INTO subjects (subject_id, name) VALUES
    (1, 'Math'),
    (2, 'English'),
    (3, 'Science');

-- 插入成绩数据
INSERT INTO scores (score_id, student_id, subject_id, score) VALUES
    (1, 1, 1, 95),
    (2, 1, 2, 88),
    (3, 1, 3, 92),
    (4, 2, 1, 80),
    (5, 2, 2, 85),
    (6, 2, 3, 90),
    (7, 3, 1, 92),
    (8, 3, 2, 90),
    (9, 3, 3, 94),
    (10, 4, 1, 85),
    (11, 4, 2, 88),
    (12, 4, 3, 82),
    (13, 1, 1, 90),
    (14, 1, 2, 85),
    (15, 1, 3, 88),
    (16, 2, 1, 92),
    (17, 2, 2, 80),
    (18, 2, 3, 85),
    (19, 3, 1, 85),
    (20, 3, 2, 92),
    (21, 3, 3, 90),
    (22, 4, 1, 88),
    (23, 4, 2, 85),
    (24, 4, 3, 86);

练习一：

请撰写当前学生分数中排名前两位的学生成绩（数学成绩），并且按照排名顺序进行展示。

在SQL中这个需求是比较简单的，在我们使用的开源和商业数据库都有对这样的要求有特殊的函数来进行支持，rank ,rank 函数是对表中指定的值进行排序，按照指定值的排序展示，并添加rank 列来展示排序的数字。

下面的是利用这个函数来对数学的成绩进行排序，然后取其中排名等于1 和2的进行展示。

WITH ranked_scores AS (
    SELECT
        score_id,
        student_id,
        subject_id,
        score,
        RANK() OVER (ORDER BY score DESC) AS rank
    FROM
        scores
    WHERE
        subject_id = 1
)
SELECT
    score_id,
    student_id,
    subject_id,
    score
FROM
    ranked_scores
WHERE
    rank <= 2;

                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on ranked_scores  (cost=0.14..3.58 rows=1 width=16) (actual time=0.027..0.032 rows=3 loops=1)
   ->  WindowAgg  (cost=0.14..3.57 rows=1 width=24) (actual time=0.026..0.030 rows=3 loops=1)
         Run Condition: (rank() OVER (?) <= 2)
         ->  Index Scan Backward using idx_subjects_score on scores  (cost=0.14..3.56 rows=1 width=16) (actual time=0.015..0.019 rows=5 loops=1)
               Filter: (subject_id = 1)
               Rows Removed by Filter: 9
 Planning Time: 0.428 ms
 Execution Time: 0.106 ms

练习2 展示考分中，英语成绩最低，但通过数学成绩进行排名从高的人的学号，姓名和数学分数

这个语句的主要撰写的方案是，先对要进行查询的数据范围进行缩小，lowestEnglistscorestudents 就是在对最低的英语成绩进行筛选，这里的distinct on 主要是对student_id 进行去重，防止一个学生在不同的学期的英语成绩都是最低的，然后最后展示的不是两个人而是一个人两次占用的最低英语成绩的名额，基于POSTGRESQL中在order by 语句中的去重不能直接写distinct 只能使用distinct on 的表达方式。

在获得最低英语成绩的两个学生的ID后，在通过mathscoresforlowestenglish 来对要展示的数据进行组合，这里只获取英语最低成绩的两个人的ID 与整体数据进行LEFT JOIN 后只展示数学成绩，最后在获得数据后，对数据成绩进行倒序排序完成整个语句的撰写

sql_test=# WITH LowestEnglishScoreStudents AS (
    SELECT DISTINCT ON (student_id)
        student_id
    FROM
        scores
    WHERE
        subject_id = 2
    ORDER BY
        student_id, score ASC
    LIMIT 2
),
MathScoresForLowestEnglish AS (
    SELECT
        s.student_id,
        st.name,
        s.score AS math_score
    FROM
        scores s
    JOIN
        LowestEnglishScoreStudents les ON s.student_id = les.student_id
    JOIN
        students st ON s.student_id = st.student_id
    WHERE
        s.subject_id = 1
)
SELECT
    student_id,
    name,
    math_score
FROM
    MathScoresForLowestEnglish
ORDER BY
    math_score DESC;
 student_id | name  | math_score 
------------+-------+------------
          1 | John  |         95
          2 | Alice |         92
          1 | John  |         90
          2 | Alice |         80
(4 rows)

                                                                       QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=5.96..5.96 rows=1 width=226) (actual time=0.116..0.118 rows=4 loops=1)
   Sort Key: s.score DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=3.85..5.95 rows=1 width=226) (actual time=0.094..0.109 rows=4 loops=1)
         Join Filter: (scores.student_id = s.student_id)
         ->  Nested Loop  (cost=3.71..5.76 rows=1 width=226) (actual time=0.086..0.094 rows=2 loops=1)
               ->  Limit  (cost=3.57..3.57 rows=1 width=8) (actual time=0.063..0.067 rows=2 loops=1)
                     ->  Unique  (cost=3.57..3.57 rows=1 width=8) (actual time=0.061..0.064 rows=2 loops=1)
                           ->  Sort  (cost=3.57..3.57 rows=1 width=8) (actual time=0.060..0.061 rows=3 loops=1)
                                 Sort Key: scores.student_id, scores.score
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Index Scan using idx_student_id on scores  (cost=0.14..3.56 rows=1 width=8) (actual time=0.022..0.029 rows=8 loops=1)
                                       Filter: (subject_id = 2)
                                       Rows Removed by Filter: 16
               ->  Index Scan using students_pkey on students st  (cost=0.15..2.17 rows=1 width=222) (actual time=0.011..0.011 rows=1 loops=2)
                     Index Cond: (student_id = scores.student_id)
         ->  Index Scan using idx_student_id on scores s  (cost=0.14..0.17 rows=1 width=8) (actual time=0.003..0.005 rows=2 loops=2)
               Index Cond: (student_id = st.student_id)
               Filter: (subject_id = 1)
               Rows Removed by Filter: 4
 Planning Time: 0.590 ms
 Execution Time: 0.164 ms
(22 rows)

练习 3 我们对于英语最差，的同学的数学成绩进行平均值的计算并展示

下面的语句是程序员一般撰写的方式，然后后面我们用DBA对这个语句进行优化，大家对比的看

    sql_test=# WITH LowestEnglishScoreStudents AS (
    SELECT DISTINCT ON (student_id)
        student_id
    FROM
        scores
    WHERE
        subject_id = 2
    ORDER BY
        student_id, score ASC
    LIMIT 2
),
MathScoresForLowestEnglish AS (
    SELECT
        s.student_id,
        st.name,
        (SELECT AVG(score) FROM scores WHERE subject_id = 1 AND student_id = s.student_id) AS avg_math_score
    FROM
        scores s
    JOIN
        LowestEnglishScoreStudents les ON s.student_id = les.student_id
    JOIN
        students st ON s.student_id = st.student_id
    WHERE
        s.subject_id = 1
)
SELECT
    distinct(student_id),name,avg_math_score
FROM
    MathScoresForLowestEnglish;
 student_id | name  |   avg_math_score    
------------+-------+---------------------
          1 | John  | 92.5000000000000000
          2 | Alice | 86.0000000000000000

我们可以得出我们要的结果，然后我们看看语句的撰写中的执行计划的分析

sql_test=# explain analyze WITH LowestEnglishScoreStudents AS (
    SELECT DISTINCT ON (student_id)
        student_id
    FROM
        scores
    WHERE
        subject_id = 2
    ORDER BY
        student_id, score ASC
    LIMIT 2
),
MathScoresForLowestEnglish AS (
    SELECT
        s.student_id,
        st.name,
        (SELECT AVG(score) FROM scores WHERE subject_id = 1 AND student_id = s.student_id) AS avg_math_score
    FROM
        scores s
    JOIN
        LowestEnglishScoreStudents les ON s.student_id = les.student_id
    JOIN
        students st ON s.student_id = st.student_id
    WHERE
        s.subject_id = 1
)
SELECT
    distinct(student_id),name,avg_math_score
FROM
    MathScoresForLowestEnglish;
                                                                           QUERY PLAN                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=8.13..8.14 rows=1 width=254) (actual time=0.134..0.140 rows=2 loops=1)
   ->  Sort  (cost=8.13..8.13 rows=1 width=254) (actual time=0.134..0.135 rows=4 loops=1)
         Sort Key: s.student_id, st.name, ((SubPlan 1))
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=3.85..8.12 rows=1 width=254) (actual time=0.088..0.122 rows=4 loops=1)
               Join Filter: (scores.student_id = s.student_id)
               ->  Nested Loop  (cost=3.71..5.76 rows=1 width=226) (actual time=0.067..0.073 rows=2 loops=1)
                     ->  Limit  (cost=3.57..3.57 rows=1 width=8) (actual time=0.055..0.058 rows=2 loops=1)
                           ->  Unique  (cost=3.57..3.57 rows=1 width=8) (actual time=0.053..0.055 rows=2 loops=1)
                                 ->  Sort  (cost=3.57..3.57 rows=1 width=8) (actual time=0.052..0.053 rows=3 loops=1)
                                       Sort Key: scores.student_id, scores.score
                                       Sort Method: quicksort  Memory: 25kB
                                       ->  Index Scan using idx_student_id on scores  (cost=0.14..3.56 rows=1 width=8) (actual time=0.036..0.043 rows=8 loops=1)
                                             Filter: (subject_id = 2)
                                             Rows Removed by Filter: 16
                     ->  Index Scan using students_pkey on students st  (cost=0.15..2.17 rows=1 width=222) (actual time=0.005..0.005 rows=1 loops=2)
                           Index Cond: (student_id = scores.student_id)
               ->  Index Scan using idx_student_id on scores s  (cost=0.14..0.17 rows=1 width=4) (actual time=0.002..0.004 rows=2 loops=2)
                     Index Cond: (student_id = st.student_id)
                     Filter: (subject_id = 1)
                     Rows Removed by Filter: 4
               SubPlan 1
                 ->  Aggregate  (cost=2.16..2.17 rows=1 width=32) (actual time=0.007..0.007 rows=1 loops=4)
                       ->  Index Scan using idx_student_id on scores scores_1  (cost=0.14..2.16 rows=1 width=4) (actual time=0.002..0.003 rows=2 loops=4)
                             Index Cond: (student_id = s.student_id)
                             Filter: (subject_id = 1)
                             Rows Removed by Filter: 4
 Planning Time: 0.657 ms
 Execution Time: 0.214 ms
(29 rows)

上面的执行计划，有一些复杂，有没有方式将语句进行改变，降低执行计划的复杂性。

sql_test=# explain analyze WITH LowestEnglishScoreStudents AS (
    SELECT DISTINCT ON (student_id)
        student_id
    FROM
        scores
    WHERE
        subject_id = 2
    ORDER BY
        student_id, score ASC
    LIMIT 2
),
MathScoresForLowestEnglish AS (
    SELECT
        s.student_id,
        AVG(s.score) AS avg_math_score
    FROM
        scores s
    JOIN
        LowestEnglishScoreStudents les ON s.student_id = les.student_id
    WHERE
        s.subject_id = 1
    GROUP BY
        s.student_id
)
SELECT
    m.student_id,
    st.name,
    m.avg_math_score
FROM
    MathScoresForLowestEnglish m
JOIN
    students st ON m.student_id = st.student_id;
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=3.85..7.96 rows=1 width=254) (actual time=0.080..0.091 rows=2 loops=1)
   ->  GroupAggregate  (cost=3.71..5.77 rows=1 width=36) (actual time=0.072..0.079 rows=2 loops=1)
         Group Key: s.student_id
         ->  Nested Loop  (cost=3.71..5.75 rows=1 width=8) (actual time=0.054..0.066 rows=4 loops=1)
               ->  Limit  (cost=3.57..3.57 rows=1 width=8) (actual time=0.045..0.049 rows=2 loops=1)
                     ->  Unique  (cost=3.57..3.57 rows=1 width=8) (actual time=0.044..0.047 rows=2 loops=1)
                           ->  Sort  (cost=3.57..3.57 rows=1 width=8) (actual time=0.043..0.044 rows=3 loops=1)
                                 Sort Key: scores.student_id, scores.score
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Index Scan using idx_student_id on scores  (cost=0.14..3.56 rows=1 width=8) (actual time=0.024..0.032 rows=8 loops=1)
                                       Filter: (subject_id = 2)
                                       Rows Removed by Filter: 16
               ->  Index Scan using idx_student_id on scores s  (cost=0.14..2.16 rows=1 width=8) (actual time=0.004..0.006 rows=2 loops=2)
                     Index Cond: (student_id = scores.student_id)
                     Filter: (subject_id = 1)
                     Rows Removed by Filter: 4
   ->  Index Scan using students_pkey on students st  (cost=0.15..2.17 rows=1 width=222) (actual time=0.004..0.004 rows=1 loops=2)
         Index Cond: (student_id = s.student_id)
 Planning Time: 0.339 ms
 Execution Time: 0.147 ms
(20 rows)

这两个语句的结果都是一样的，但是下面的语句明显从执行的时间，和计划的产生都要好过上面开发人员的撰写的语句。

DBA 撰写

WITH LowestEnglishScoreStudents AS (
    SELECT DISTINCT ON (student_id)
        student_id
    FROM
        scores
    WHERE
        subject_id = 2
    ORDER BY
        student_id, score ASC
    LIMIT 2
),
MathScoresForLowestEnglish AS (
    SELECT
        s.student_id,
        AVG(s.score) AS avg_math_score
    FROM
        scores s
    JOIN
        LowestEnglishScoreStudents les ON s.student_id = les.student_id
    WHERE
        s.subject_id = 1
    GROUP BY
        s.student_id
)
SELECT
    m.student_id,
    st.name,
    m.avg_math_score
FROM
    MathScoresForLowestEnglish m
JOIN
    students st ON m.student_id = st.student_id;

程序员撰写

 WITH LowestEnglishScoreStudents AS (
    SELECT DISTINCT ON (student_id)
        student_id
    FROM
        scores
    WHERE
        subject_id = 2
    ORDER BY
        student_id, score ASC
    LIMIT 2
),
MathScoresForLowestEnglish AS (
    SELECT
        s.student_id,
        st.name,
        (SELECT AVG(score) FROM scores WHERE subject_id = 1 AND student_id = s.student_id) AS avg_math_score
    FROM
        scores s
    JOIN
        LowestEnglishScoreStudents les ON s.student_id = les.student_id
    JOIN
        students st ON s.student_id = st.student_id
    WHERE
        s.subject_id = 1
)
SELECT
    distinct(student_id),name,avg_math_score
FROM
    MathScoresForLowestEnglish;

那么这两条语句最大的不同在于在平均成绩的计算上，DBA 撰写的方式是，条件过滤合并的方式，而程序员的撰写方式明显是在使用内部子查询来完成运算，相当于每一条数据都要进行计算，导致在执行的过程中多次访问和扫描 score 表。

通过上面的语句来进行演示，明显DBA的撰写的方式从语句的理解和执行的效率都高于开发人员撰写的语句，那么为什么开发人员一般撰写语句都撰写成第二种方式，主要还是由于思维的方式，对象，方法，类的思维方式导致程序员写不出高效率的SQL。

这里没有贬低程序员的意思，只是职业不同思维的角度不同导致的差异。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2024-05-19，如有侵权请联系 cloudcommunity@tencent.com 删除

postgresql