首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >postgresql计数(非重复...)非常慢

postgresql计数(非重复...)非常慢
EN

Stack Overflow用户
提问于 2012-06-29 01:52:45
回答 4查看 218.2K关注 0票数 196

我有一个非常简单的SQL查询:

SELECT COUNT(DISTINCT x) FROM table;

我的表大约有150万行。这个查询的运行速度非常慢;它需要大约7.5秒

 SELECT COUNT(x) FROM table;

大约需要435ms。有没有办法改变我的查询以提高性能?我尝试过分组和常规计数,以及在x上建立索引;两者的执行时间都是7.5秒。

EN

回答 4

Stack Overflow用户

发布于 2013-02-06 23:17:10

您可以使用以下命令:

SELECT COUNT(*) FROM (SELECT DISTINCT column_name FROM table_name) AS temp;

这比以下代码快得多:

COUNT(DISTINCT column_name)
票数 389
EN

Stack Overflow用户

发布于 2012-06-29 02:32:28

-- My default settings (this is basically a single-session machine, so work_mem is pretty high)
SET effective_cache_size='2048MB';
SET work_mem='16MB';

\echo original
EXPLAIN ANALYZE
SELECT
        COUNT (distinct val) as aantal
FROM one
        ;

\echo group by+count(*)
EXPLAIN ANALYZE
SELECT
        distinct val
       -- , COUNT(*)
FROM one
GROUP BY val;

\echo with CTE
EXPLAIN ANALYZE
WITH agg AS (
    SELECT distinct val
    FROM one
    GROUP BY val
    )
SELECT COUNT (*) as aantal
FROM agg
        ;

结果:

original                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=36448.06..36448.07 rows=1 width=4) (actual time=1766.472..1766.472 rows=1 loops=1)
   ->  Seq Scan on one  (cost=0.00..32698.45 rows=1499845 width=4) (actual time=31.371..185.914 rows=1499845 loops=1)
 Total runtime: 1766.642 ms
(3 rows)

group by+count(*)
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=36464.31..36477.31 rows=1300 width=4) (actual time=412.470..412.598 rows=1300 loops=1)
   ->  HashAggregate  (cost=36448.06..36461.06 rows=1300 width=4) (actual time=412.066..412.203 rows=1300 loops=1)
         ->  Seq Scan on one  (cost=0.00..32698.45 rows=1499845 width=4) (actual time=26.134..166.846 rows=1499845 loops=1)
 Total runtime: 412.686 ms
(4 rows)

with CTE
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=36506.56..36506.57 rows=1 width=0) (actual time=408.239..408.239 rows=1 loops=1)
   CTE agg
     ->  HashAggregate  (cost=36464.31..36477.31 rows=1300 width=4) (actual time=407.704..407.847 rows=1300 loops=1)
           ->  HashAggregate  (cost=36448.06..36461.06 rows=1300 width=4) (actual time=407.320..407.467 rows=1300 loops=1)
                 ->  Seq Scan on one  (cost=0.00..32698.45 rows=1499845 width=4) (actual time=24.321..165.256 rows=1499845 loops=1)
       ->  CTE Scan on agg  (cost=0.00..26.00 rows=1300 width=0) (actual time=407.707..408.154 rows=1300 loops=1)
     Total runtime: 408.300 ms
    (7 rows)

与CTE相同的计划可能也可以由其他方法产生(窗口函数)

票数 13
EN

Stack Overflow用户

发布于 2012-07-01 02:21:16

如果您的count(distinct(x))count(x)慢得多,那么您可以通过使用触发器在不同的表(例如table_name_x_counts (x integer not null, x_count int not null) )中维护x值计数来加快此查询的速度。但是您的写入性能将受到影响,如果您在单个事务中更新多个x值,则需要以某种显式顺序执行此操作,以避免可能的死锁。

票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/11250253

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档