我正在尝试用SQL预测订单流程的开始/结束时间。我已经确定了过去过程的平均持续时间。这些进程在多个并行的行(RNr)中运行,并且行彼此独立。每行可以有1-30个具有不同持续时间的进程(PNr)。过程的持续时间可能会有所不同,并且仅称为平均持续时间。一个进程完成后,下一个进程会自动启动。PNr 1 finish = PNr 2 start。
每行中第一个进程的开始时间在开始时是已知的,并且每行的开始时间是相同的。当一些过程完成时,时间是已知的,应该用来计算对即将到来的过程的更准确的预测。如何预测进程启动或停止的时间?
我使用了一个大型的子查询来获取这个表。
RNr PNr Duration_avg_h Start Finish
1 1 1 2019-06-06 16:32:11 2019-06-06 16:33:14
1 2 262 2019-06-06 16:33:14 NULL
1 3 51 NULL NULL
1 4 504 NULL NULL
1 5 29 NULL NULL
2 1 1 2019-06-06 16:32:11 NULL
2 2 124 NULL NULL
2 3 45 NULL NULL
2 4 89 NULL NULL
2 5 19 NULL NULL
2 6 1565 NULL NULL
2 7 24 NULL NULL现在我想找出预测的值。
SELECT
RNr,
PNr,
Duration_avg_h,
Start,
Finish,
Predicted_Start = CASE
WHEN Start IS NULL
THEN DATEADD(HH,LAG(Duration_avg_h, 1,NULL) OVER (ORDER BY RNr,PNr), LAG(Start, 1,NULL) OVER (ORDER BY RNr,PNr))
ELSE Start END,
Predicted_Finish = CASE
WHEN Finish IS NULL
THEN DATEADD(HH,Duration_avg_h,Start)
ELSE Finish END,
SUM(Duration_avg_h) over (PARTITION BY RNr ORDER BY RNr, PNr ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Duration_row_h
FROM (...)
ORDER BY RNr, PNr我尝试了LAG (),但是我只得到了下一行的值。我也没有得出“无界前行和当前行之间的行”的结论。
RNr PNr Duration_avg_h Start Finish Predicted_Start Predicted_Finish Duration_row_h
1 1 1 2019-06-06 16:32:11 2019-06-06 16:33:14 2019-06-06 16:32:11 2019-06-06 16:33:14 1
1 2 262 2019-06-06 16:33:14 NULL 2019-06-06 16:33:14 2019-06-17 14:33:14 263
1 3 51 NULL NULL 2019-06-17 14:33:14 NULL 314
1 4 504 NULL NULL NULL NULL 818
1 5 29 NULL NULL NULL NULL 847
2 1 1 2019-06-06 16:32:11 NULL 2019-06-06 16:32:11 2019-06-06 17:32:11 1
2 2 124 NULL NULL 2019-06-06 17:32:11 NULL 125
2 3 45 NULL NULL NULL NULL 170
2 4 89 NULL NULL NULL NULL 259
2 5 19 NULL NULL NULL NULL 278那么,有人能帮我填充Predicted_Start和Predicted_Finish列吗?
发布于 2019-09-17 21:21:32
只有当您的所有行都有值时,LAG才有效。对于此用例,您需要将结果从一行级联到另一行。这样做的一种方法是使用自连接来获取运行总数
--Sample Data
DECLARE @dataset TABLE
(
RNr INT
,PNr INT
,Duration_avg_h INT
,START DATETIME
,Finish DATETIME
)
INSERT INTO @dataset
(
RNr
,PNr
,Duration_avg_h
,START
,Finish
)
VALUES
(1, 1, 1, '2019-06-06 16:32:11',NULL)
,(1, 2, 262, NULL,NULL)
,(1, 3, 51, NULL,NULL)
,(1, 4, 504, NULL,NULL)
,(1, 5, 29, NULL,NULL)
,(2, 1, 1, '2019-06-06 16:32:11', NULL)
,(2, 2, 124, NULL,NULL)
,(2, 3, 45, NULL,NULL)
,(2, 4, 89, NULL,NULL)
,(2, 5, 19, NULL,NULL)
,(2, 6, 1565, NULL,NULL)
,(2, 7, 24, NULL,NULL)SELECT
d.RNr,
d.PNr,
d.Duration_avg_h,
d.Start,
d.Finish,
--SUM() gives us the total time up to and including this step
--take of the current step and you get the total time of all the previous steps
--this can give us our start time, or when the previous step ended.
SUM(running_total.Duration_avg_h) - d.Duration_avg_h AS running_total_time,
--MIN() gives us the lowest start time we have pre process.
MIN(running_total.Start) AS min_start,
ISNULL(
d.Start
,DATEADD(HH,SUM(running_total.Duration_avg_h),MIN(running_total.Start) )
) AS Predicted_Start,
ISNULL(
d.Finish
,DATEADD(HH,SUM(running_total.Duration_avg_h),MIN(running_total.Start) )
) AS Predicted_Finish
FROM @dataset AS d
LEFT JOIN @dataset AS running_total
ON d.RNr = running_total.RNr
AND
--the running total for all steps.
running_total.PNr <= d.PNr
GROUP BY
d.RNr,
d.PNr,
d.Duration_avg_h,
d.Start,
d.Finish
ORDER BY
RNr,
PNr一旦您有了实际的完成时间,此代码将无法工作,除非您将Duration_avg_h更新为实际所用的小时数。
https://stackoverflow.com/questions/57973482
复制相似问题