我有两套来自外部来源的数据--客户的购买日期和客户最后的电子邮件点击/打开日期。它分别存储在两个表( PURCHASE_INTER和ACTIVITY_INTER表)中。购买数据是多个,我需要收集最后的购买日期。但是对于每个客户来说,活动数据是唯一的。数据相互独立,其他数据集可能不存在。我们编写了下面的查询,它结合了两个表,根据来自外部来源的客户的id person_id对它们进行分组,并获取最新的日期,与我们的客户表连接以获得客户电子邮件,并再次加入到另一个表中,这些数据最终将被存储在其中,以便知道它是插入操作还是更新操作。你能建议我如何提高这个查询的性能吗?太慢了,花了10多个小时。PURCHASE_INTER和ACTIVITY_INTER表中有数以百万计的记录。
SELECT INTER.*, C.ID AS CUSTOMER_ID, C.EMAIL AS CUSTOMER_EMAIL, LSI.ID AS INTERACTION_ID, ROW_NUMBER() OVER (ORDER BY PERSON_ID ASC) AS RN FROM (
SELECT PERSON_ID AS PERSON_ID,
MAX(LAST_CLICK_DATE) AS LAST_CLICK_DATE,
MAX(LAST_OPEN_DATE) AS LAST_OPEN_DATE,
MAX(LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
FROM (
SELECT ACT.PERSON_ID AS PERSON_ID,
ACT.LAST_CLICK_DATE AS LAST_CLICK_DATE,
ACT.LAST_OPEN_DATE AS LAST_OPEN_DATE,
NULL AS LAST_PURCHASE_DATE
FROM ACTIVITY_INTER ACT
WHERE ACT.JOB_ID = 77318317
UNION
SELECT PUR.PERSON_ID AS PERSON_ID,
NULL AS LAST_CLICK_DATE,
NULL AS LAST_OPEN_DATE,
PUR.LAST_PURCHASE_DATE AS LAST_PURCHASE_DATE
FROM PURCHASE_INTER PUR
WHERE PUR.JOB_ID = 77318317
) GROUP BY PERSON_ID
) INTER LEFT JOIN CUSTOMER C ON INTER.PERSON_ID = C.PERSON_ID
LEFT JOIN INTERACTION LSI ON C.ID = LSI.CUSTOMER_ID;发布于 2016-03-30 14:25:32
您的查询建议使用以下索引:
ACTIVITY_INTER(JOB_ID, PERSON_ID, LAST_CLICK_DATE, LAST_OPEN_DATE)PURCHASE_INTER(JOB_ID, PERSON_ID, LAST_PURCHASE_DATE)CUSTOMER(PERSON_ID)INTERACTION(CUSTOMER_ID)(对于前两个索引,除非匹配的数量非常多,否则第一列比其他两列更重要。)
另外,将UNION更改为UNION ALL。UNION会导致删除重复项的开销--这是不可能的(至少在两个子查询之间),因为每个子查询返回不同的列。
此外,您可能希望用full outer join替换第一个子查询。
SELECT COALESCE(a.PERSON_ID, p.PERSON_ID) as PERSON_ID,
a.LAST_CLICK_DATE, a.LAST_OPEN_DATE,p.LAST_PURCHASE_DATE
FROM (SELECT ACT.PERSON_ID AS PERSON_ID,
MAX(ACT.LAST_CLICK_DATE) AS LAST_CLICK_DATE,
MAX(ACT.LAST_OPEN_DATE) AS LAST_OPEN_DATE
FROM ACTIVITY_INTER ACT
WHERE ACT.JOB_ID = 77318317
GROUP BY ACT.PERSON_ID
) a FULL OUTER JOIN
(SELECT PUR.PERSON_ID AS PERSON_ID,
MAX(PUR.LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
FROM PURCHASE_INTER PUR
WHERE PUR.JOB_ID = 77318317
GROUP BY PER.PERSON_ID
) p
ON a.PERSON_ID = p.PERSON_ID这为Oracle提供了更多优化选项,因为聚合是直接在表上完成的--为处理提供更好的索引和更好的统计信息。
https://stackoverflow.com/questions/36311608
复制相似问题