我正在使用SQL2000。我已经可以基于一组复杂的条件删除所有不需要的重复项,但是查询现在需要几个小时才能完成,而过去只需要3.5分钟就可以获得包含重复项的数据。
为了清楚起见:只要rpt.HostName或rpt.SystemSerialNumber字段也不同,我就可以有一个重复的rpt.Name字段。此外,我必须根据四个不同列的时间戳确定保留哪个条目,因为这些列中的一些列缺少时间戳。
任何帮助都是非常感谢的!
SELECT
rpt.[Name],
rpt.LastAgentExecution,
rpt.GroupName,
rpt.PackageName,
rpt.PackageVersion,
rpt.ProcedureName,
rpt.HostName,
rpt.SystemSerialNumber,
rpt.JobCreationTime,
rpt.JobActivationTime,
rpt.[Job Completion Time]
FROM DSM_StandardGroupMembersProcedureActivityViewExt rpt
WHERE
(
(
rpt.GroupName = 'Adobe Acrobat 7 Deploy'
OR rpt.GroupName = 'Adobe Acrobat 8 Deploy'
)
AND
(
(rpt.PackageName = 'Adobe Acrobat 7' AND rpt.PackageVersion = '-1.0')
OR (rpt.PackageName = 'Adobe Acrobat 8' AND rpt.PackageVersion = '-3.0')
)
)
AND NOT EXISTS
(
SELECT *
FROM DSM_StandardGroupMembersProcedureActivityViewExt rpt_dupe
WHERE
(
(
rpt.GroupName = 'Adobe Acrobat 7 Deploy'
OR rpt.GroupName = 'Adobe Acrobat 8 Deploy'
)
AND
(
(rpt.PackageName = 'Adobe Acrobat 7' AND rpt.PackageVersion = '-1.0')
OR (rpt.PackageName = 'Adobe Acrobat 8' AND rpt.PackageVersion = '-3.0')
)
AND
(
(rpt_dupe.[Name] = rpt.[Name])
AND
(
(rpt_dupe.SystemSerialNumber = rpt.SystemSerialNumber)
OR (rpt_dupe.HostName = rpt.HostName)
)
AND
(
(rpt_dupe.LastAgentExecution < rpt.LastAgentExecution)
OR (rpt_dupe.JobActivationTime < rpt.JobActivationTime)
OR (rpt_dupe.JobCreationTime < rpt.JobCreationTime)
OR (rpt_dupe.[Job Completion Time] < rpt.[Job Completion Time])
)
)
)
)
发布于 2012-05-11 08:53:18
原因是not exists子句。
一种建议是将其重写为左外部连接:
from <big query> left outer join
<dups query>
on <all the fields that constitute a match>
where <dups query>.<some field> is null
我发现not并不存在,而且经常优化得很差。
另一个建议是将此查询更改为更直接的实现:
with t as (
SELECT rpt.[Name], rpt.LastAgentExecution, rpt.GroupName, rpt.PackageName,
rpt.PackageVersion, rpt.ProcedureName, rpt.HostName, rpt.SystemSerialNumber,
rpt.JobCreationTime, rpt.JobActivationTime, rpt.[Job Completion Time]
FROM DSM_StandardGroupMembersProcedureActivityViewExt rpt
WHERE rpt.GroupName in ('Adobe Acrobat 7 Deploy', 'Adobe Acrobat 8 Deploy') AND
((rpt.PackageName = 'Adobe Acrobat 7' AND rpt.PackageVersion = '-1.0') OR
(rpt.PackageName = 'Adobe Acrobat 8' AND rpt.PackageVersion = '-3.0')
)
)
select t.*
from t join
(select name, ..., max(id)
from t
group by name, ...
) tsum
on t.id = tsum.id
也就是说,按您希望不同的列对表进行汇总。选择其中一行。这里,我假设有一个"id“字段来唯一地标识每一行。您可能必须使用字段的组合,例如名称和日期。如果没有id,这将更具挑战性。在较新版本的SQL server中,可以使用row_number()。
发布于 2012-05-11 07:43:30
试着这样做:
SELECT t_main.columns
FROM table as t_main
LEFT JOIN
(
SELECT name, MAX(lastAgentExecution)..... FROM table GROUP BY name,serialnumber, hostname
)
as t_joinSerial
ON t_main.name=t_joinSerial.name,lastAgentExecution etc.
where (t_main.AdobeStuff and t_joinSerial is NULL)
https://stackoverflow.com/questions/10543165
复制相似问题