我有两个表(它们更大,有超过200,000行,这只是一个例子)
Table1:temptable
id |Chromosome | Start | End | Reference | Alternative | QUAL | MQ
........................................................................
NULL chr1 12334 12335 A TT 10 20
NULL chr1 12334 12335 C TT 5 3
NULL chr2 123 123 A T 1 2
NULL chr2 34 34 A T 11 60
NULL chr3 12 12 A T 6 NULL
Table2:TableVariants2
id | Chromosome | Start | End | Reference | Alternative
.............................................................
1 chr1 12334 12335 A TT
2 chr1 12334 12335 C TT
3 chr2 123 123 A T
4 chr2 34 34 A T
5 chr3 12 12 A T
我想根据相等的染色体、开始、结束、引用和替代来合并它们,并获得:
id |Chromosome | Start | End | Reference | Alternative | QUAL | MQ
........................................................................
1 chr1 12334 12335 A TT 10 20
2 chr1 12334 12335 C TT 5 3
3 chr2 123 123 A T 1 2
4 chr2 34 34 A T 11 60
5 chr3 12 12 A T 6 NULL
我已经尝试过以下方法,但对于大桌子来说太慢了,我需要更快的方法
SELECT B.Maxid, A.Chromosome, A.Start, A.End, A.Reference, A.Alternative , A.QUAL,A.MQ
FROM temptable A
INNER JOIN ( SELECT `Chromosome`, `Start`, `End`, `Reference`,`Alternative`, MAX(id) AS Maxid FROM TableVariants2 GROUP BY `Chromosome`, `Start`, `End`, `Reference`,`Alternative`) B
ON A.Chromosome=B.Chromosome AND A.Start=B.Start AND A.End=B.End AND A.Reference=B.Reference AND A.Alternative=B.Alternative
为了加快查询速度,除了内连接之外,还有什么替代方法吗?
发布于 2019-02-28 17:29:30
在编写查询时,您可以遵循以下几点:
1. Analyze your data.
2. Make sure you have created necessary indexes.
3. In Select statement write only required column, avoid unnecessary columns.
4. Do not write subquery or co-related query until and unless is required.
让我们看看遵循这些步骤是否会有所帮助。但是内连接是基本的连接,如果你必须连接两个表,它是不可替代的。
发布于 2019-03-08 07:15:30
INDEX(Chromosome, Start, End, Reference, Alternative, id)
--与GROUP BY
的顺序相同。
https://stackoverflow.com/questions/54922052
复制相似问题