首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用PostgreSQL和MySQL的大型表

使用PostgreSQL和MySQL的大型表
EN

Database Administration用户
提问于 2017-06-05 20:47:28
回答 2查看 110关注 0票数 -1

我编写了一个应用程序,它解析imdb (从它们的公共ftp文件)数据,并将其存储到我选择的数据库中。

我的两个数据库表有超过100万条记录(2.316.952和1.392.866)。我的应用程序的目的是在每次用户单击按钮时选择1部电影(其中的数据驻留在两个表中)。所以我的问题是,什么是存储我的数据最有效的方法?

我试过PostgreSQL,我对此很满意。我唯一的问题是很难找到安装了PostgreSQL的廉价主机。

我试过MySQL,但出于某种原因,它比Postgres慢得多。如果Postgres select将花费3.5s,MySQL将超过10s。有什么好主意吗?MySQL使用了innodb引擎。

总结一下:有谁能想到更好的存储数据的方法吗?为什么MySQL慢一点?

编辑:

这有点问题,我使用hibernate/JPA,使用动态查询(hibernate api生成所有SQL)。

至于表格:

代码语言:javascript
复制
CREATE TABLE "movies" (
    "id" INTEGER NOT NULL,
    "name" TEXT NULL DEFAULT NULL,
    "rank" DOUBLE PRECISION NOT NULL,
    "releaseyear" INTEGER NULL DEFAULT NULL,
    "summary" TEXT NULL DEFAULT NULL,
    "votes" BIGINT NOT NULL,
    PRIMARY KEY ("id")
)

CREATE TABLE "movies_genres" (
    "moviejpa_id" INTEGER NOT NULL,
    "genres_id" INTEGER NOT NULL
)

CREATE TABLE "genres" (
    "id" INTEGER NOT NULL,
    "name" VARCHAR(255) NULL DEFAULT NULL,
    PRIMARY KEY ("id")
)

表也由hibernate/jpa创建。

数据示例:

编辑(管理打印hibernate生成的查询):

代码语言:javascript
复制
select
    moviejpa0_.id as col_0_0_ 
from
    movies moviejpa0_ 
inner join
    movies_genres genres1_ 
        on moviejpa0_.id=genres1_.MovieJpa_id 
inner join
    genres genrejpa2_ 
        on genres1_.genres_id=genrejpa2_.id 
where
    (
        genrejpa2_.id in (
            ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ?
        )
    ) 
    and (
        moviejpa0_.rank between 0.0 and 10.0
    ) 
    and (
        moviejpa0_.votes between 0 and 1814484
    )

select
    moviejpa0_.id as id1_4_0_,
    moviejpa0_.name as name2_4_0_,
    moviejpa0_.rank as rank3_4_0_,
    moviejpa0_.releaseYear as releaseY4_4_0_,
    moviejpa0_.summary as summary5_4_0_,
    moviejpa0_.votes as votes6_4_0_,
    genres1_.MovieJpa_id as MovieJpa1_5_1_,
    genrejpa2_.id as genres_i2_5_1_,
    genrejpa2_.id as id1_3_2_,
    genrejpa2_.name as name2_3_2_ 
from
    movies moviejpa0_ 
left outer join
    movies_genres genres1_ 
        on moviejpa0_.id=genres1_.MovieJpa_id 
left outer join
    genres genrejpa2_ 
        on genres1_.genres_id=genrejpa2_.id 
where
    moviejpa0_.id=?

编辑:

Postgre计划:(我已经有一段时间没有直接使用数据库了,我使用了explain SQL,不知道这是否仍然正确)

第一个SQL:

代码语言:javascript
复制
Nested Loop Left Join  (cost=0.43..39238.32 rows=2 width=1108)
Join Filter: (genres1_.genres_id = genrejpa2_.id)
->  Nested Loop Left Join  (cost=0.43..39222.37 rows=2 width=588)
->  Materialize  (cost=0.00..12.10 rows=140 width=520)
    Join Filter: (moviejpa0_.id = genres1_.moviejpa_id)
    ->  Seq Scan on movies_genres genres1_  (cost=0.00..39213.90 rows=2 width=8)
    ->  Seq Scan on genres genrejpa2_  (cost=0.00..11.40 rows=140 width=520)
    ->  Index Scan using movies_pkey on movies moviejpa0_  (cost=0.43..8.45 rows=1 width=580)
    Index Cond: (id = 916424)
    Filter: (moviejpa_id = 916424)

第二个SQL:

代码语言:javascript
复制
Hash Join  (cost=84978.62..150810.17 rows=579193 width=4)
  Hash Cond: (genres1_.moviejpa_id = moviejpa0_.id)
  ->  Hash Join  (cost=17.96..47920.43 rows=579238 width=4)
        Hash Cond: (genres1_.genres_id = genrejpa2_.id)
        ->  Seq Scan on movies_genres genres1_  (cost=0.00..33421.52 rows=2316952 width=8)
        ->  Hash  (cost=17.53..17.53 rows=35 width=4)
              ->  Seq Scan on genres genrejpa2_  (cost=0.00..17.53 rows=35 width=4)
                    Filter: (id = ANY ('{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34}'::integer[]))
  ->  Hash  (cost=62116.86..62116.86 rows=1392384 width=4)
        ->  Seq Scan on movies moviejpa0_  (cost=0.00..62116.86 rows=1392384 width=4)
              Filter: ((rank >= '0'::double precision) AND (rank <= '10'::double precision) AND (votes >= 0) AND (votes <= 1814484))

MySQL计划:

第一个sql:

代码语言:javascript
复制
<row>
    <field name="id">1</field>
    <field name="select_type">SIMPLE</field>
    <field name="table">moviejpa0_</field>
    <field name="partitions" xsi:nil="true" />
    <field name="type">const</field>
    <field name="possible_keys">PRIMARY</field>
    <field name="key">PRIMARY</field>
    <field name="key_len">4</field>
    <field name="ref">const</field>
    <field name="rows">1</field>
    <field name="filtered">100.00</field>
    <field name="Extra" xsi:nil="true" />
</row>
<row>
    <field name="id">1</field>
    <field name="select_type">SIMPLE</field>
    <field name="table">genres1_</field>
    <field name="partitions" xsi:nil="true" />
    <field name="type">ref</field>
    <field name="possible_keys">FK1oevm2ns4a61icpilhyq1dwr7</field>
    <field name="key">FK1oevm2ns4a61icpilhyq1dwr7</field>
    <field name="key_len">4</field>
    <field name="ref">const</field>
    <field name="rows">1</field>
    <field name="filtered">100.00</field>
    <field name="Extra" xsi:nil="true" />
</row>
<row>
    <field name="id">1</field>
    <field name="select_type">SIMPLE</field>
    <field name="table">genrejpa2_</field>
    <field name="partitions" xsi:nil="true" />
    <field name="type">eq_ref</field>
    <field name="possible_keys">PRIMARY</field>
    <field name="key">PRIMARY</field>
    <field name="key_len">4</field>
    <field name="ref">test.genres1_.genres_id</field>
    <field name="rows">1</field>
    <field name="filtered">100.00</field>
    <field name="Extra" xsi:nil="true" />
</row>

第二个sql:

代码语言:javascript
复制
        <field name="id">1</field>
    <field name="select_type">SIMPLE</field>
    <field name="table">moviejpa0_</field>
    <field name="partitions" xsi:nil="true" />
    <field name="type">ALL</field>
    <field name="possible_keys">PRIMARY</field>
    <field name="key" xsi:nil="true" />
    <field name="key_len" xsi:nil="true" />
    <field name="ref" xsi:nil="true" />
    <field name="rows">1321403</field>
    <field name="filtered">1.23</field>
    <field name="Extra">Using where</field>
</row>
<row>
    <field name="id">1</field>
    <field name="select_type">SIMPLE</field>
    <field name="table">genres1_</field>
    <field name="partitions" xsi:nil="true" />
    <field name="type">ref</field>
    <field name="possible_keys">FKabwobqnegu888274nercpwc9p,FK1oevm2ns4a61icpilhyq1dwr7</field>
    <field name="key">FK1oevm2ns4a61icpilhyq1dwr7</field>
    <field name="key_len">4</field>
    <field name="ref">test.moviejpa0_.id</field>
    <field name="rows">1</field>
    <field name="filtered">100.00</field>
    <field name="Extra">Using where</field>
</row>
<row>
    <field name="id">1</field>
    <field name="select_type">SIMPLE</field>
    <field name="table">genrejpa2_</field>
    <field name="partitions" xsi:nil="true" />
    <field name="type">eq_ref</field>
    <field name="possible_keys">PRIMARY</field>
    <field name="key">PRIMARY</field>
    <field name="key_len">4</field>
    <field name="ref">test.genres1_.genres_id</field>
    <field name="rows">1</field>
    <field name="filtered">100.00</field>
    <field name="Extra">Using index</field>
</row>

编辑: 2017-06-13设法找出,这个查询占用了大部分时间:

代码语言:javascript
复制
select
    moviejpa0_.id as col_0_0_ 
from
    movies moviejpa0_ 
inner join
    movies_genres genres1_ 
        on moviejpa0_.id=genres1_.MovieJpa_id 
inner join
    genres genrejpa2_ 
        on genres1_.genres_id=genrejpa2_.id 
where
    (
        genrejpa2_.id in (
            ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ?
        )
    ) 
    and (
        moviejpa0_.rank between 0.0 and 10.0
    ) 
    and (
        moviejpa0_.votes between 0 and 1814484
    )

CREATE TABLE "movies_genres" (
    "genres_id" INTEGER NOT NULL,
    "moviejpa_id" INTEGER NOT NULL,
    INDEX "" ("moviejpa_id"),
    UNIQUE INDEX "UNIQUE" ("genres_id", "moviejpa_id")
)

解释:

代码语言:javascript
复制
 QUERY PLAN  
 -  
 Hash Join  (cost=85187.47..150712.33 rows=576153 width=4)  
   Hash Cond: (genres1_.genres_id = moviejpa0_.id)  
   ->  Hash Join  (cost=17.96..47666.67 rows=576165 width=4)  
         Hash Cond: (genres1_.moviejpa_id = genrejpa2_.id)  
         ->  Seq Scan on movies_genres genres1_  (cost=0.00..33244.59 rows=2304659 width=8)  
         ->  Hash  (cost=17.53..17.53 rows=35 width=4)  
               ->  Seq Scan on genres genrejpa2_  (cost=0.00..17.53 rows=35 width=4)  
                     Filter: (id = ANY ('{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34}'::integer[]))  
   ->  Hash  (cost=62271.76..62271.76 rows=1395660 width=4)  
         ->  Seq Scan on movies moviejpa0_  (cost=0.00..62271.76 rows=1395660 width=4)  
               Filter: ((rank >= '0'::double precision) AND (rank <= '10'::double precision) AND (votes >= 0) AND (votes <= 1816967))  

解释分析:

代码语言:javascript
复制
 QUERY PLAN  
 -  
 Hash Join  (cost=85187.47..150712.33 rows=576153 width=4) (actual time=619.514..2039.717 rows=2304659 loops=1)  
   Hash Cond: (genres1_.genres_id = moviejpa0_.id)  
   ->  Hash Join  (cost=17.96..47666.67 rows=576165 width=4) (actual time=0.099..424.101 rows=2304659 loops=1)  
         Hash Cond: (genres1_.moviejpa_id = genrejpa2_.id)  
         ->  Seq Scan on movies_genres genres1_  (cost=0.00..33244.59 rows=2304659 width=8) (actual time=0.025..108.536 rows=2304659 loops=1)  
         ->  Hash  (cost=17.53..17.53 rows=35 width=4) (actual time=0.038..0.038 rows=35 loops=1)  
               Buckets: 1024  Batches: 1  Memory Usage: 10kB  
               ->  Seq Scan on genres genrejpa2_  (cost=0.00..17.53 rows=35 width=4) (actual time=0.013..0.027 rows=35 loops=1)  
                     Filter: (id = ANY ('{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34}'::integer[]))  
   ->  Hash  (cost=62271.76..62271.76 rows=1395660 width=4) (actual time=615.854..615.854 rows=1396310 loops=1)  
         Buckets: 131072  Batches: 32  Memory Usage: 2562kB  
         ->  Seq Scan on movies moviejpa0_  (cost=0.00..62271.76 rows=1395660 width=4) (actual time=0.065..436.606 rows=1396310 loops=1)  
               Filter: ((rank >= '0'::double precision) AND (rank <= '10'::double precision) AND (votes >= 0) AND (votes <= 1816967))  
 Planning time: 0.964 ms  
 Execution time: 2071.123 ms  

第二个查询:

代码语言:javascript
复制
 EXPLAIN
 QUERY PLAN  
 -  
 Nested Loop Left Join  (cost=12.92..33.08 rows=2 width=1112)  
   Join Filter: (moviejpa0_.id = genres1_.genres_id)  
   ->  Index Scan using movies_pkey on movies moviejpa0_  (cost=0.43..8.45 rows=1 width=584)  
         Index Cond: (id = 481380)  
   ->  Hash Right Join  (cost=12.49..24.61 rows=2 width=524)  
         Hash Cond: (genrejpa2_.id = genres1_.moviejpa_id)  
         ->  Seq Scan on genres genrejpa2_  (cost=0.00..11.40 rows=140 width=520)  
         ->  Hash  (cost=12.46..12.46 rows=2 width=8)  
               ->  Index Only Scan using movies_genres_pkey on movies_genres genres1_  (cost=0.43..12.46 rows=2 width=8)  
                     Index Cond: (genres_id = 481380) 

 EXPLAIN analyze            
 QUERY PLAN  
 -  
 Nested Loop Left Join  (cost=12.92..33.08 rows=2 width=1112) (actual time=0.883..0.910 rows=3 loops=1)  
   Join Filter: (moviejpa0_.id = genres1_.genres_id)  
   ->  Index Scan using movies_pkey on movies moviejpa0_  (cost=0.43..8.45 rows=1 width=584) (actual time=0.219..0.221 rows=1 loops=1)  
         Index Cond: (id = 481380)  
   ->  Hash Right Join  (cost=12.49..24.61 rows=2 width=524) (actual time=0.655..0.679 rows=3 loops=1)  
         Hash Cond: (genrejpa2_.id = genres1_.moviejpa_id)  
         ->  Seq Scan on genres genrejpa2_  (cost=0.00..11.40 rows=140 width=520) (actual time=0.028..0.034 rows=35 loops=1)  
         ->  Hash  (cost=12.46..12.46 rows=2 width=8) (actual time=0.589..0.589 rows=3 loops=1)  
               Buckets: 1024  Batches: 1  Memory Usage: 9kB  
               ->  Index Only Scan using movies_genres_pkey on movies_genres genres1_  (cost=0.43..12.46 rows=2 width=8) (actual time=0.557..0.563 rows=3 loops=1)  
                     Index Cond: (genres_id = 481380)  
                     Heap Fetches: 3  
 Planning time: 0.689 ms  
 Execution time: 1.026 ms  
EN

回答 2

Database Administration用户

发布于 2017-06-06 04:27:36

使用PostgreSQL,

  1. 不要使用双引号
  2. 将主键标记为这样。
  3. rank可以是real类型。
  4. 永远都不需要NULL DEFAULT NULL。这是默认的。
  5. 在movies_genres上添加PRIMARY KEY (genres_id, moviejpa_id)
  6. 确保类型有一个id的主键,它在您的模式中,但从您的计划看,它似乎不存在。
  7. 您可能需要text而不是varchar(255)

你可以像这样快速修复你现在的设置,

代码语言:javascript
复制
CREATE INDEX ON movies_genres(MovieJpa_id);
ALTER TABLE genres ADD PRIMARY KEY (id);
ALTER TABLE movies_genres ADD PRIMARY KEY (genres_id, moviejpa_id);
ANALYZE movies_genres;
ANALYZE genres;

另外,还应该添加show work mem的结果;

票数 3
EN

Database Administration用户

发布于 2017-06-06 01:26:28

甚至3.5秒也是太多了。你需要索引。

这是很多:电影和体裁之间的许多映射,对吗?

代码语言:javascript
复制
CREATE TABLE "movies_genres" (
    "moviejpa_id" INTEGER NOT NULL,
    "genres_id" INTEGER NOT NULL
)

它应该是(至少对于MySQL):

代码语言:javascript
复制
CREATE TABLE `movies_genres` (
    `moviejpa_id` INT UNSIGNED NOT NULL,
    `genres_id` SMALLINT UNSIGNED NOT NULL,
    PRIMARY KEY(moviejpa_id, genres_id),
    INDEX(genres_id, moviejpa_id)
) ENGINE=InnoDB;

有关它的更多讨论,请参见

请为EXPLAIN SELECT ...尝试提供MySQL。

票数 1
EN
页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://dba.stackexchange.com/questions/175485

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档