首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >哪些适当的作用域/索引可以帮助通过rails查找更多的作用域?

哪些适当的作用域/索引可以帮助通过rails查找更多的作用域?
EN

Stack Overflow用户
提问于 2013-01-19 19:40:35
回答 7查看 484关注 0票数 6

我有一个相对较大的4深度关系数据设置,如下所示:

代码语言:javascript
运行
复制
ClientApplication         has_many => ClientApplicationVersions
ClientApplicationVersions has_many => CloudLogs
CloudLogs                 has_many => Logs

client_applications:(潜在的1000条记录)

成型机- ...

成型机- account_id

成型机- public_key

成型机- deleted_at

client_application_versions:(可能有10,000个记录)

成型机- ...

成型机- client_application_id

成型机- public_key

成型机- deleted_at

cloud_logs:(可能有1,000,000个记录)

再来.

成型机- client_application_version_id

成型机- public_key

成型机- deleted_at

logs:(可能有10亿的记录)

成型机- ...

成型机- cloud_log_id

成型机- public_key

成型机- time_stamp

成型机- deleted_at

我还在开发,所以结构和设置不是一成不变的,但我希望它是好的。使用Rails 3.2.11和InnoDB MySQL。数据库中当前填充了一小部分数据(与最终的db大小相比) (logs只有50万行),我有4个作用域查询,其中3个是有问题的,用于检索日志。

  1. 抓取日志的第一页,按时间戳排序,受account_idclient_application.public_keyclient_application_version.public_key的限制(超过100秒)
  2. 抓取日志的第一页,按时间戳排序,受account_idclient_application.public_key限制(超过100秒)
  3. 抓取日志的第一页,按时间戳排序,受account_id限制(超过100秒)
  4. 抓取日志的第一页,按时间戳排序(~2秒)

我使用rails作用域来帮助进行以下调用:

代码语言:javascript
运行
复制
  scope :account_id, proc {|account_id| joins(:client_application).where("client_applications.account_id = ?", account_id) }
  scope :client_application_key, proc {|client_application_key| joins(:client_application).where("client_applications.public_key = ?", client_application_key) }
  scope :client_application_version_key, proc {|client_application_version_key| joins(:client_application_version).where("client_application_versions.public_key = ?", client_application_version_key) }

  default_scope order('logs.timestamp DESC')

我在public_key上的每个表上都有索引。我在logs表上有几个索引,包括优化器喜欢使用的索引(index_logs_on_cloud_log_id),但是这些查询仍然要花费很长时间才能运行。

下面是如何在rails console中调用该方法

代码语言:javascript
运行
复制
Log.account_id(1).client_application_key('p0kZudG0').client_application_version_key('0HgoJRyE').page(1)

..。下面是rails将其转化为什么:

代码语言:javascript
运行
复制
SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `cloud_logs`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` ON `client_application_versions`.`id` = `cloud_logs`.`client_application_version_id` INNER JOIN `client_applications` ON `client_applications`.`id` = `client_application_versions`.`client_application_id` INNER JOIN `cloud_logs` `cloud_logs_logs_join` ON `cloud_logs_logs_join`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` `client_application_versions_logs` ON `client_application_versions_logs`.`id` = `cloud_logs_logs_join`.`client_application_version_id` WHERE (logs.deleted_at IS NULL) AND (client_applications.account_id = 1) AND (client_applications.public_key = 'p0kZudG0') AND (client_application_versions.public_key = '0HgoJRyE') ORDER BY logs.timestamp DESC LIMIT 100 OFFSET 0

..。这是这个查询的EXPLAIN语句。

代码语言:javascript
运行
复制
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table                            | type   | possible_keys                                                                                                                                         | key                                               | key_len | ref                                                                    | rows | Extra                                        |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | client_application_versions      | ref    | PRIMARY,index_client_application_versions_on_client_application_id,index_client_application_versions_on_public_key                                    | index_client_application_versions_on_public_key   | 768     | const                                                                  |    1 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | client_applications              | eq_ref | PRIMARY,index_client_applications_on_account_id,index_client_applications_on_public_key                                                               | PRIMARY                                           | 4       | cloudlog_production.client_application_versions.client_application_id  |    1 | Using where                                  |
|  1 | SIMPLE      | cloud_logs                       | ref    | PRIMARY,index_cloud_logs_on_client_application_version_id                                                                                             | index_cloud_logs_on_client_application_version_id | 5       | cloudlog_production.client_application_versions.id                     |  481 | Using where; Using index                     |
|  1 | SIMPLE      | cloud_logs_logs_join             | eq_ref | PRIMARY,index_cloud_logs_on_client_application_version_id                                                                                             | PRIMARY                                           | 4       | cloudlog_production.cloud_logs.id                                      |    1 |                                              |
|  1 | SIMPLE      | client_application_versions_logs | eq_ref | PRIMARY                                                                                                                                               | PRIMARY                                           | 4       | cloudlog_production.cloud_logs_logs_join.client_application_version_id |    1 | Using index                                  |
|  1 | SIMPLE      | logs                             | ref    | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id                        | 5       | cloudlog_production.cloud_logs.id                                      |    4 | Using where                                  |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+

这个问题有三个部分:

  1. 我是否可以使用额外的索引来优化我的数据库,以帮助这些类型的依赖于连接的排序查询变得更有性能?
  2. 我可以优化rails代码以帮助这种类型的find以更高的性能运行吗?
  3. 我是否只是简单地接近这个作用域,为大型数据集找到了错误的方法?

更新1/24/12

正如Geoff和J_MCCaffrey在答案中所建议的那样,我已经将查询分成了三个不同的部分来尝试和隔离这个问题。正如预期的那样,这是一个处理最大表格的问题。MYSQL优化器通过使用不同的索引来不同地处理这个问题,但是延迟仍然存在。以下是对这种方法的解释。

代码语言:javascript
运行
复制
ClientApplication.find_by_account_id_and_public_key(1, 'p0kZudG0').versions.select{|cav| cav.public_key = '0HgoJRyE'}.first.logs.page(2)
  ClientApplication Load (165.9ms)  SELECT `client_applications`.* FROM `client_applications` WHERE `client_applications`.`account_id` = 1 AND `client_applications`.`public_key` = 'p0kZudG0' AND (client_applications.deleted_at IS NULL) ORDER BY client_applications.id LIMIT 1
  ClientApplicationVersion Load (105.1ms)  SELECT `client_application_versions`.* FROM `client_application_versions` WHERE `client_application_versions`.`client_application_id` = 3 AND (client_application_versions.deleted_at IS NULL) ORDER BY client_application_versions.created_at DESC, client_application_versions.id DESC
  Log Load (57295.0ms)  SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
  EXPLAIN (214.5ms)  EXPLAIN SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
EXPLAIN for: SELECT  `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table      | type        | possible_keys                                                                                                                                         | key                                                                              | key_len | ref                               | rows | Extra                                                                                                                                           |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 | SIMPLE      | cloud_logs | index_merge | PRIMARY,index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at                                                              | index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at | 5,9     | NULL                              | 1874 | Using intersect(index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at); Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | logs       | ref         | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id                                                       | 5       | cloudlog_production.cloud_logs.id |    4 | Using where                                                                                                                                     |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+

更新1/25/12

以下是所有相关表格的索引:

代码语言:javascript
运行
复制
CLIENT_APPLICATIONS:
  PRIMARY KEY  (`id`),
  UNIQUE KEY `index_client_applications_on_key` (`key`),
  KEY `index_client_applications_on_account_id` (`account_id`),
  KEY `index_client_applications_on_deleted_at` (`deleted_at`),
  KEY `index_client_applications_on_public_key` (`public_key`)

CLIENT_APPLICATION_VERSIONS:
  PRIMARY KEY  (`id`),
  KEY `index_client_application_versions_on_client_application_id` (`client_application_id`),
  KEY `index_client_application_versions_on_deleted_at` (`deleted_at`),
  KEY `index_client_application_versions_on_public_key` (`public_key`)

CLOUD_LOGS:
  PRIMARY KEY  (`id`),
  KEY `index_cloud_logs_on_api_client_version_id` (`api_client_version_id`),
  KEY `index_cloud_logs_on_client_application_version_id` (`client_application_version_id`),
  KEY `index_cloud_logs_on_deleted_at` (`deleted_at`),
  KEY `index_cloud_logs_on_device_id` (`device_id`),
  KEY `index_cloud_logs_on_public_key` (`public_key`),
  KEY `index_cloud_logs_on_received_at` (`received_at`)

LOGS:
  PRIMARY KEY  (`id`),
  KEY `index_logs_on_class_name` (`class_name`),
  KEY `index_logs_on_cloud_log_id_and_deleted_at_and_timestamp` (`cloud_log_id`,`deleted_at`,`timestamp`),
  KEY `index_logs_on_cloud_log_id_and_deleted_at` (`cloud_log_id`,`deleted_at`),
  KEY `index_logs_on_cloud_log_id` (`cloud_log_id`),
  KEY `index_logs_on_deleted_at` (`deleted_at`),
  KEY `index_logs_on_file_name` (`file_name`),
  KEY `index_logs_on_method_name` (`method_name`),
  KEY `index_logs_on_public_key` (`public_key`),
  KEY `index_logs_on_timestamp` USING BTREE (`timestamp`)
EN

回答 7

Stack Overflow用户

回答已采纳

发布于 2013-01-20 16:13:13

我写这封信是为了解决我自己的问题,希望能找到更好的答案。目前,DB是完全按书本关系设置的.

代码语言:javascript
运行
复制
ClientApplication         has_many => ClientApplicationVersions
ClientApplicationVersions has_many => CloudLogs
CloudLogs                 has_many => Logs

这意味着,当我需要找到属于客户端应用程序的日志时,我必须做3个额外的联接才能得到它。通过在日志表中引入一些foreign_key去规范化,我可以跳过所有的联接:

代码语言:javascript
运行
复制
ClientApplication         has_many => ClientApplicationVersions
ClientApplication         has_many => Logs
ClientApplicationVersions has_many => CloudLogs
ClientApplicationVersions has_many => Logs
CloudLogs                 has_many => Logs

最终的结果是,我的日志表中会有一些额外的列:client_application_keyclient_application_version_keycloud_log_key

尽管我冒着数据不一致的风险,但我还是可以避免这里的3个联接,因为这有助于降低查询的性能。谁来劝我别这样。

票数 0
EN

Stack Overflow用户

发布于 2013-01-25 02:27:36

以更好的结构显示它,查询看起来如下(已经重新排列)

代码语言:javascript
运行
复制
SELECT
  `logs`.*
FROM
  `logs` as l
  INNER JOIN `cloud_logs` as cl1
    ON
      cl1.id = l.cloud_log_id
  INNER JOIN `cloud_logs` as cl2
    ON
      cl2.id = l.cloud_log_id
  INNER JOIN `client_application_versions` as cav1
    ON
      cav1.id = cl1.client_application_version_id
  INNER JOIN `client_application_versions` as cav2
    ON
      cav2.id = cl2.client_application_version_id
  INNER JOIN `client_applications` as ca
    ON
      ca.id = cav1.client_application_id
WHERE
  (l.deleted_at IS NULL)
    AND
  (ca.account_id = 1)
    AND
  (ca.public_key = 'p0kZudG0')
    AND
  (cav.public_key = '0HgoJRyE')
ORDER BY
  logs.timestamp DESC
LIMIT
  0, 100

当查看cav1/cl1和cav2/cl2时,人们可以看到,cav2和cl2从未被使用过。除了ON语句之外,没有应用于它们的筛选器。

因此,cav1链接到正确的accoutn,cav2不链接到任何帐户,而是包含所有匹配的帐户。这不是查询结果的问题,而是连接缓冲区的大小问题。

移除连接(以及它们的部分)会产生:

代码语言:javascript
运行
复制
SELECT
  `logs`.*
FROM
  `logs` as l
  INNER JOIN `cloud_logs` as cl1
    ON
      cl1.id = l.cloud_log_id
--  INNER JOIN `cloud_logs` as cl2
--    ON
--      cl2.id = l.cloud_log_id
  INNER JOIN `client_application_versions` as cav1 use index for join (`index_cavs_on_client_application_id_and_public_key`)
    ON
      cav1.id = cl1.client_application_version_id
        AND
      cav1.public_key = '0HgoJRyE'

--  INNER JOIN `client_application_versions` as cav2
--    ON
--      cav2.id = cl2.client_application_version_id
  INNER JOIN `client_applications` as ca
    ON
      ca.id = cav1.client_application_id
WHERE
  (l.deleted_at IS NULL)
    AND
  (ca.account_id = 1)
    AND
  (ca.public_key = 'p0kZudG0')
ORDER BY
  logs.timestamp DESC
LIMIT
  0, 100

这个应该更快。

将其打包到可在控制台中使用的东西(假设正确的表关系和meta_where):

代码语言:javascript
运行
复制
Log.where(:deleted_at.ne => nil).order("logs.timestamp desc").joins(:cloud_logs) & \
CloudLog.joins(:client_application_versions) & \
ClientApplicationVersion.where(:public_key => '0HgoJRyE').joins(:client_applications) & \
ClientApplication.where(:public_key => 'p0kZudG0', :account_id => 1)

因为我不能在这里再现这个,所以您可能需要自己尝试(或者在后面添加一个to_sql ),还可以为上面的简短查询添加一个解释。

结果可能会很有趣。

更新:在看到结果和定义之后(以下评论):

尝试添加一个键:

代码语言:javascript
运行
复制
alter table client_application_versions add key (`client_application_id`, `public_key`);

这应该可以防止文件短缺并加快速度。

编辑:更新的查询提示mysql有关键。

票数 1
EN

Stack Overflow用户

发布于 2013-01-19 22:33:23

不幸的是,我的Rails优化经验都是使用PostgreSQL的,所以其中大部分可能不适用。不过,我确实有几个可能适用的建议:

尝试在您的作用域中使用joins而不是includes -- includes用于触发急切的加载--您所看到的某些减速完全可能是不必要的模型正在加载。即使不是,使用joins应该会产生一个更易读的查询--是includes将所有列混成't2_r8',等等。

此外,您还需要确保所有可能被过滤的列都被索引--一般来说,以_id结尾的列可能会以这种方式被引用,并且可能应该被索引,以及您在作用域中专门筛选的任何列(比如client_application_version_key)。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/14418134

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档