首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >加速全文搜索- pgsql

加速全文搜索- pgsql
EN

Stack Overflow用户
提问于 2017-03-03 02:01:09
回答 1查看 122关注 0票数 0

我看到了上百万个线程,用来加速postgresql查询的全文搜索。我试着做每件事,但没有更多的想法。我有一个很大的表(目前有20612971条记录),我用pgsql的全文检索对它进行了搜索,然后按ts_rank_cd排序。我达到了大约3500-4000ms来执行查询。有没有什么办法能让它更快?如果可能的话,我不想使用像狮身人面像或solr这样的外部软体。因此,原生postgresql解决方案是首选:)下面描述了我的表和解释分析选择的示例。

代码语言:javascript
运行
复制
# \d artifacts.item
                                           Table "artifacts.item"
     Column          |            Type             |                          Modifiers                          
-------------------------+-----------------------------+-------------------------------------------------------------
 add_timestamp           | timestamp without time zone | 
 author_account_id       | integer                     | 
 description             | text                        | 
 id                      | integer                     | not null default nextval('artifacts.item_id_seq'::regclass)
 removed_since_timestamp | timestamp without time zone | 
 slug                    | character varying(2044)     | not null
 thumb_height            | integer                     | 
 thumb_path              | character varying(2044)     | default NULL::character varying
 thumb_width             | integer                     | 
 title                   | character varying(2044)     | not null
 search_data             | tsvector                    | 
 tags                    | integer[]                   | 
 is_age_restricted       | boolean                     | not null default false
 is_on_homepage          | boolean                     | not null default false
 is_public               | boolean                     | not null default false
 thumb_filename          | character varying(2044)     | 
 is_removed              | boolean                     | not null default false
Indexes:
    "artifacts_item_add_timestamp_idx" btree (add_timestamp DESC NULLS LAST)
    "artifacts_item_id_idx" btree (id)
    "artifacts_item_is_on_homepage_add_timestamp" btree (is_on_homepage DESC, add_timestamp DESC NULLS LAST)
    "artifacts_item_is_on_homepage_idx" btree (is_on_homepage)
    "artifacts_item_search_results" gin (search_data) WHERE is_public IS TRUE AND is_removed IS FALSE
    "artifacts_item_tags_gin_idx" gin (tags)
    "artifacts_item_thumbs_list" btree (is_public, is_removed, id DESC)
    "index1" btree (add_timestamp)
    "itemIdx" btree (is_public, is_removed, is_age_restricted)
    "item_author_account_id_idx" btree (author_account_id)

分析:

代码语言:javascript
运行
复制
# explain analyze SELECT i.id, 
#     i.title, 
#     i.description, 
#     i.slug, 
#     i.thumb_path, 
#     i.thumb_filename, 
#     CONCAT(
#         i.thumb_path, 
#         '/', 
#         i.thumb_filename
#     ) AS thumb_url, 
#     (CASE WHEN i.thumb_width = 0 THEN 280 ELSE i.thumb_width END) as thumb_width, 
#     (CASE WHEN i.thumb_height = 0 THEN 280 ELSE i.thumb_height END) as thumb_height, 
#     (i.thumb_height > i.thumb_width) AS is_vertical, 
#     i.add_timestamp 
# FROM artifacts.item AS i 
# WHERE i.is_public IS true 
#     AND i.is_removed IS false 
#     AND (i.search_data @@ to_tsquery('public.polish', $$'lego'$$)) 
# ORDER BY ts_rank_cd(i.search_data, to_tsquery('public.polish', $$'lego'$$)) desc, 
#     ts_rank_cd(i.search_data, to_tsquery('public.polish', $$'lego'$$)) desc, 
#     i.add_timestamp DESC NULLS LAST  
# LIMIT 60
# OFFSET 0;
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=358061.78..358061.93 rows=60 width=315) (actual time=335.870..335.876 rows=60 loops=1)
   ->  Sort  (cost=358061.78..358357.25 rows=118189 width=315) (actual time=335.868..335.868 rows=60 loops=1)
         Sort Key: (ts_rank_cd(search_data, '''lego'' | ''lega'''::tsquery)), add_timestamp
         Sort Method: top-N heapsort  Memory: 55kB
         ->  Bitmap Heap Scan on item i  (cost=2535.96..353980.19 rows=118189 width=315) (actual time=33.163..308.371 rows=62025 loops=1)
               Recheck Cond: ((search_data @@ '''lego'' | ''lega'''::tsquery) AND (is_public IS TRUE) AND (is_removed IS FALSE))
               ->  Bitmap Index Scan on artifacts_item_search_results  (cost=0.00..2506.42 rows=118189 width=0) (actual time=23.066..23.066 rows=62085 loops=1)
                     Index Cond: (search_data @@ '''lego'' | ''lega'''::tsquery)
 Total runtime: 335.967 ms
(9 rows)

Time: 3444.731 ms
EN

Stack Overflow用户

发布于 2017-03-03 04:07:06

符合条件的行有62025行,必须对它们进行…排序

这将需要一段时间。有没有可能在RAM中拥有整个数据库或至少是索引?那会有帮助的。

票数 0
EN
查看全部 1 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42562641

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档