我有一个相对较大的4深度关系数据设置,如下所示:
ClientApplication has_many => ClientApplicationVersions
ClientApplicationVersions has_many => CloudLogs
CloudLogs has_many => Logsclient_applications:(潜在的1000条记录)
成型机- ...
成型机- account_id
成型机- public_key
成型机- deleted_at
client_application_versions:(可能有10,000个记录)
成型机- ...
成型机- client_application_id
成型机- public_key
成型机- deleted_at
cloud_logs:(可能有1,000,000个记录)
再来.
成型机- client_application_version_id
成型机- public_key
成型机- deleted_at
logs:(可能有10亿的记录)
成型机- ...
成型机- cloud_log_id
成型机- public_key
成型机- time_stamp
成型机- deleted_at
我还在开发,所以结构和设置不是一成不变的,但我希望它是好的。使用Rails 3.2.11和InnoDB MySQL。数据库中当前填充了一小部分数据(与最终的db大小相比) (logs只有50万行),我有4个作用域查询,其中3个是有问题的,用于检索日志。
account_id、client_application.public_key、client_application_version.public_key的限制(超过100秒)account_id、client_application.public_key限制(超过100秒)account_id限制(超过100秒)我使用rails作用域来帮助进行以下调用:
scope :account_id, proc {|account_id| joins(:client_application).where("client_applications.account_id = ?", account_id) }
scope :client_application_key, proc {|client_application_key| joins(:client_application).where("client_applications.public_key = ?", client_application_key) }
scope :client_application_version_key, proc {|client_application_version_key| joins(:client_application_version).where("client_application_versions.public_key = ?", client_application_version_key) }
default_scope order('logs.timestamp DESC')我在public_key上的每个表上都有索引。我在logs表上有几个索引,包括优化器喜欢使用的索引(index_logs_on_cloud_log_id),但是这些查询仍然要花费很长时间才能运行。
下面是如何在rails console中调用该方法
Log.account_id(1).client_application_key('p0kZudG0').client_application_version_key('0HgoJRyE').page(1)..。下面是rails将其转化为什么:
SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `cloud_logs`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` ON `client_application_versions`.`id` = `cloud_logs`.`client_application_version_id` INNER JOIN `client_applications` ON `client_applications`.`id` = `client_application_versions`.`client_application_id` INNER JOIN `cloud_logs` `cloud_logs_logs_join` ON `cloud_logs_logs_join`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` `client_application_versions_logs` ON `client_application_versions_logs`.`id` = `cloud_logs_logs_join`.`client_application_version_id` WHERE (logs.deleted_at IS NULL) AND (client_applications.account_id = 1) AND (client_applications.public_key = 'p0kZudG0') AND (client_application_versions.public_key = '0HgoJRyE') ORDER BY logs.timestamp DESC LIMIT 100 OFFSET 0..。这是这个查询的EXPLAIN语句。
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | client_application_versions | ref | PRIMARY,index_client_application_versions_on_client_application_id,index_client_application_versions_on_public_key | index_client_application_versions_on_public_key | 768 | const | 1 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | client_applications | eq_ref | PRIMARY,index_client_applications_on_account_id,index_client_applications_on_public_key | PRIMARY | 4 | cloudlog_production.client_application_versions.client_application_id | 1 | Using where |
| 1 | SIMPLE | cloud_logs | ref | PRIMARY,index_cloud_logs_on_client_application_version_id | index_cloud_logs_on_client_application_version_id | 5 | cloudlog_production.client_application_versions.id | 481 | Using where; Using index |
| 1 | SIMPLE | cloud_logs_logs_join | eq_ref | PRIMARY,index_cloud_logs_on_client_application_version_id | PRIMARY | 4 | cloudlog_production.cloud_logs.id | 1 | |
| 1 | SIMPLE | client_application_versions_logs | eq_ref | PRIMARY | PRIMARY | 4 | cloudlog_production.cloud_logs_logs_join.client_application_version_id | 1 | Using index |
| 1 | SIMPLE | logs | ref | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id | 5 | cloudlog_production.cloud_logs.id | 4 | Using where |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+这个问题有三个部分:
find以更高的性能运行吗?更新1/24/12
正如Geoff和J_MCCaffrey在答案中所建议的那样,我已经将查询分成了三个不同的部分来尝试和隔离这个问题。正如预期的那样,这是一个处理最大表格的问题。MYSQL优化器通过使用不同的索引来不同地处理这个问题,但是延迟仍然存在。以下是对这种方法的解释。
ClientApplication.find_by_account_id_and_public_key(1, 'p0kZudG0').versions.select{|cav| cav.public_key = '0HgoJRyE'}.first.logs.page(2)
ClientApplication Load (165.9ms) SELECT `client_applications`.* FROM `client_applications` WHERE `client_applications`.`account_id` = 1 AND `client_applications`.`public_key` = 'p0kZudG0' AND (client_applications.deleted_at IS NULL) ORDER BY client_applications.id LIMIT 1
ClientApplicationVersion Load (105.1ms) SELECT `client_application_versions`.* FROM `client_application_versions` WHERE `client_application_versions`.`client_application_id` = 3 AND (client_application_versions.deleted_at IS NULL) ORDER BY client_application_versions.created_at DESC, client_application_versions.id DESC
Log Load (57295.0ms) SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
EXPLAIN (214.5ms) EXPLAIN SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
EXPLAIN for: SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | cloud_logs | index_merge | PRIMARY,index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at | index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at | 5,9 | NULL | 1874 | Using intersect(index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at); Using where; Using temporary; Using filesort |
| 1 | SIMPLE | logs | ref | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id | 5 | cloudlog_production.cloud_logs.id | 4 | Using where |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+更新1/25/12
以下是所有相关表格的索引:
CLIENT_APPLICATIONS:
PRIMARY KEY (`id`),
UNIQUE KEY `index_client_applications_on_key` (`key`),
KEY `index_client_applications_on_account_id` (`account_id`),
KEY `index_client_applications_on_deleted_at` (`deleted_at`),
KEY `index_client_applications_on_public_key` (`public_key`)
CLIENT_APPLICATION_VERSIONS:
PRIMARY KEY (`id`),
KEY `index_client_application_versions_on_client_application_id` (`client_application_id`),
KEY `index_client_application_versions_on_deleted_at` (`deleted_at`),
KEY `index_client_application_versions_on_public_key` (`public_key`)
CLOUD_LOGS:
PRIMARY KEY (`id`),
KEY `index_cloud_logs_on_api_client_version_id` (`api_client_version_id`),
KEY `index_cloud_logs_on_client_application_version_id` (`client_application_version_id`),
KEY `index_cloud_logs_on_deleted_at` (`deleted_at`),
KEY `index_cloud_logs_on_device_id` (`device_id`),
KEY `index_cloud_logs_on_public_key` (`public_key`),
KEY `index_cloud_logs_on_received_at` (`received_at`)
LOGS:
PRIMARY KEY (`id`),
KEY `index_logs_on_class_name` (`class_name`),
KEY `index_logs_on_cloud_log_id_and_deleted_at_and_timestamp` (`cloud_log_id`,`deleted_at`,`timestamp`),
KEY `index_logs_on_cloud_log_id_and_deleted_at` (`cloud_log_id`,`deleted_at`),
KEY `index_logs_on_cloud_log_id` (`cloud_log_id`),
KEY `index_logs_on_deleted_at` (`deleted_at`),
KEY `index_logs_on_file_name` (`file_name`),
KEY `index_logs_on_method_name` (`method_name`),
KEY `index_logs_on_public_key` (`public_key`),
KEY `index_logs_on_timestamp` USING BTREE (`timestamp`)发布于 2013-01-20 16:13:13
我写这封信是为了解决我自己的问题,希望能找到更好的答案。目前,DB是完全按书本关系设置的.
ClientApplication has_many => ClientApplicationVersions
ClientApplicationVersions has_many => CloudLogs
CloudLogs has_many => Logs这意味着,当我需要找到属于客户端应用程序的日志时,我必须做3个额外的联接才能得到它。通过在日志表中引入一些foreign_key去规范化,我可以跳过所有的联接:
ClientApplication has_many => ClientApplicationVersions
ClientApplication has_many => Logs
ClientApplicationVersions has_many => CloudLogs
ClientApplicationVersions has_many => Logs
CloudLogs has_many => Logs最终的结果是,我的日志表中会有一些额外的列:client_application_key、client_application_version_key和cloud_log_key。
尽管我冒着数据不一致的风险,但我还是可以避免这里的3个联接,因为这有助于降低查询的性能。谁来劝我别这样。
发布于 2013-01-25 02:27:36
以更好的结构显示它,查询看起来如下(已经重新排列)
SELECT
`logs`.*
FROM
`logs` as l
INNER JOIN `cloud_logs` as cl1
ON
cl1.id = l.cloud_log_id
INNER JOIN `cloud_logs` as cl2
ON
cl2.id = l.cloud_log_id
INNER JOIN `client_application_versions` as cav1
ON
cav1.id = cl1.client_application_version_id
INNER JOIN `client_application_versions` as cav2
ON
cav2.id = cl2.client_application_version_id
INNER JOIN `client_applications` as ca
ON
ca.id = cav1.client_application_id
WHERE
(l.deleted_at IS NULL)
AND
(ca.account_id = 1)
AND
(ca.public_key = 'p0kZudG0')
AND
(cav.public_key = '0HgoJRyE')
ORDER BY
logs.timestamp DESC
LIMIT
0, 100当查看cav1/cl1和cav2/cl2时,人们可以看到,cav2和cl2从未被使用过。除了ON语句之外,没有应用于它们的筛选器。
因此,cav1链接到正确的accoutn,cav2不链接到任何帐户,而是包含所有匹配的帐户。这不是查询结果的问题,而是连接缓冲区的大小问题。
移除连接(以及它们的部分)会产生:
SELECT
`logs`.*
FROM
`logs` as l
INNER JOIN `cloud_logs` as cl1
ON
cl1.id = l.cloud_log_id
-- INNER JOIN `cloud_logs` as cl2
-- ON
-- cl2.id = l.cloud_log_id
INNER JOIN `client_application_versions` as cav1 use index for join (`index_cavs_on_client_application_id_and_public_key`)
ON
cav1.id = cl1.client_application_version_id
AND
cav1.public_key = '0HgoJRyE'
-- INNER JOIN `client_application_versions` as cav2
-- ON
-- cav2.id = cl2.client_application_version_id
INNER JOIN `client_applications` as ca
ON
ca.id = cav1.client_application_id
WHERE
(l.deleted_at IS NULL)
AND
(ca.account_id = 1)
AND
(ca.public_key = 'p0kZudG0')
ORDER BY
logs.timestamp DESC
LIMIT
0, 100这个应该更快。
将其打包到可在控制台中使用的东西(假设正确的表关系和meta_where):
Log.where(:deleted_at.ne => nil).order("logs.timestamp desc").joins(:cloud_logs) & \
CloudLog.joins(:client_application_versions) & \
ClientApplicationVersion.where(:public_key => '0HgoJRyE').joins(:client_applications) & \
ClientApplication.where(:public_key => 'p0kZudG0', :account_id => 1)因为我不能在这里再现这个,所以您可能需要自己尝试(或者在后面添加一个to_sql ),还可以为上面的简短查询添加一个解释。
结果可能会很有趣。
更新:在看到结果和定义之后(以下评论):
尝试添加一个键:
alter table client_application_versions add key (`client_application_id`, `public_key`);这应该可以防止文件短缺并加快速度。
编辑:更新的查询提示mysql有关键。
发布于 2013-01-19 22:33:23
不幸的是,我的Rails优化经验都是使用PostgreSQL的,所以其中大部分可能不适用。不过,我确实有几个可能适用的建议:
尝试在您的作用域中使用joins而不是includes -- includes用于触发急切的加载--您所看到的某些减速完全可能是不必要的模型正在加载。即使不是,使用joins应该会产生一个更易读的查询--是includes将所有列混成't2_r8',等等。
此外,您还需要确保所有可能被过滤的列都被索引--一般来说,以_id结尾的列可能会以这种方式被引用,并且可能应该被索引,以及您在作用域中专门筛选的任何列(比如client_application_version_key)。
https://stackoverflow.com/questions/14418134
复制相似问题