我的客户有一个出勤系统,它(大约)在此表单中存储缺勤数据(换句话说,在一天或半天之前):
EmployeeID AbsenceDate AbsenceDays
1 2020-06-25 1
1 2020-06-24 1
1 2020-06-23 1
1 2020-06-22 1
1 2020-06-19 1
1 2020-06-18 1
1 2020-05-25 1
1 2020-06-23 1
1 2020-06-22 0.5
我构建了一份报告,输出“原样”数据,但客户已经询问是否可以采用这种形式(将连续的相关天数汇总成一个包含sum的范围):
EmployeeID StartDate EndDate NoOfDays
1 2020-06-18 2020-06-25 6
1 2020-05-22 2020-06-25 2.5
我已经研究了空隙和孤岛解决方案,但困难的是,对于这两种解决方案,都有一个中间的周末,在这个周末,缺勤数据应该被计算在内,而不是,而不是。是否有任何方法可以使用标准SQL (而不是使用游标或其他ROBAR解决方案,因为显而易见的原因,我宁愿避免)。
发布于 2020-06-25 12:52:54
首先,这种分组可以相对容易地在客户端使用您的经典编程语言,而不是SQL。但是如果你坚持..。
我已经研究了空隙和孤岛解决方案,但困难的是,对于这两种解决方案,都有一个中间的周末,在这个周末,缺勤数据应该被计算在内,而不是,而不是。
其主要思想是为所有周末生成缺失的行,AbsenceDays
值为0,这样,当周末结束时,间隙和岛屿就不会创建额外的范围。
为此,我将使用日历表(一个包含所有日期和各种标志(如IsWeekend
)的表)。
请注意,即使周末有一些AbsenceDates,这种方法也会返回正确的结果。
样本数据
我已经调整了你的样本数据,使它更有趣和明确。(您的示例为同一个EmployeeID
列出了两次相同的日期)
DECLARE @T TABLE (EmployeeID int, AbsenceDate date, AbsenceDays float);
INSERT INTO @T
VALUES
(2, '2020-06-25', 0.5),
(2, '2020-06-24', 0.5),
(2, '2020-06-23', 0.5),
(2, '2020-06-22', 0.5),
(2, '2020-06-19', 0.5),
(2, '2020-06-18', 0.5),
-- here we go across the weekend and both Sat and Sun are skipped
(1, '2020-06-25', 1),
(1, '2020-06-24', 1),
(1, '2020-06-23', 1),
(1, '2020-06-22', 1),
(1, '2020-06-19', 1),
(1, '2020-06-18', 1),
-- here we go across the weekend and both Sat and Sun are skipped
(1, '2020-05-25', 1),
(1, '2020-05-23', 1),
(1, '2020-05-22', 0.5);
-- here we go across the weekend and only Sun is skipped
查询
此查询对所有日期使用Calendar
表和dt
,并使用标志IsWeekend
。
CTE_Boundaries
计算每个员工从日历中需要的日期范围。CTE_Weekends
给我们每个星期六和太阳一排。最后,我们将源表和日历中的日期组合在一起。
WITH
CTE_Boundaries
AS
(
SELECT
EmployeeID
,MIN(AbsenceDate) AS StartDate
,MAX(AbsenceDate) AS EndDate
FROM
@T AS T
GROUP BY
EmployeeID
)
,CTE_Weekends
AS
(
SELECT
CTE_Boundaries.EmployeeID
,Calendar.dt AS AbsenceDate
,0 AS AbsenceDays
FROM
CTE_Boundaries
INNER JOIN Calendar
ON Calendar.dt >= CTE_Boundaries.StartDate
AND Calendar.dt <= CTE_Boundaries.EndDate
WHERE
Calendar.IsWeekend = 1
)
,CTE_AllDates
AS
(
SELECT
EmployeeID
,AbsenceDate
,AbsenceDays
FROM @T AS T
UNION ALL
SELECT
EmployeeID
,AbsenceDate
,0 AS AbsenceDays
FROM
CTE_Weekends
)
SELECT
EmployeeID
,AbsenceDate
,SUM(AbsenceDays) AS AbsenceDays
FROM CTE_AllDates
GROUP BY
EmployeeID
,AbsenceDate
;
结果
+------------+-------------+-------------+
| EmployeeID | AbsenceDate | AbsenceDays |
+------------+-------------+-------------+
| 1 | 2020-05-22 | 0.5 |
| 1 | 2020-05-23 | 1 |
| 1 | 2020-05-24 | 0 |
| 1 | 2020-05-25 | 1 |
| 1 | 2020-05-30 | 0 |
| 1 | 2020-05-31 | 0 |
| 1 | 2020-06-06 | 0 |
| 1 | 2020-06-07 | 0 |
| 1 | 2020-06-13 | 0 |
| 1 | 2020-06-14 | 0 |
| 1 | 2020-06-18 | 1 |
| 1 | 2020-06-19 | 1 |
| 1 | 2020-06-20 | 0 |
| 1 | 2020-06-21 | 0 |
| 1 | 2020-06-22 | 1 |
| 1 | 2020-06-23 | 1 |
| 1 | 2020-06-24 | 1 |
| 1 | 2020-06-25 | 1 |
| 2 | 2020-06-18 | 0.5 |
| 2 | 2020-06-19 | 0.5 |
| 2 | 2020-06-20 | 0 |
| 2 | 2020-06-21 | 0 |
| 2 | 2020-06-22 | 0.5 |
| 2 | 2020-06-23 | 0.5 |
| 2 | 2020-06-24 | 0.5 |
| 2 | 2020-06-25 | 0.5 |
+------------+-------------+-------------+
现在,您可以将空白和孤岛应用到这个数据集中,您将得到一个日期、2020-05-22 - 2020-05-25
和2020-06-18 - 2020-06-25
组。你也会得到每个周末的组,但是那些单独的周末的AbsenceDays
之和是零,所以我们可以过滤掉它们。
在这里,我使用ROW_NUMBER
来解决空白和孤岛问题:
最终查询
WITH
CTE_Boundaries
AS
(
SELECT
EmployeeID
,MIN(AbsenceDate) AS StartDate
,MAX(AbsenceDate) AS EndDate
FROM
@T AS T
GROUP BY
EmployeeID
)
,CTE_Weekends
AS
(
SELECT
CTE_Boundaries.EmployeeID
,Calendar.dt AS AbsenceDate
,0 AS AbsenceDays
FROM
CTE_Boundaries
INNER JOIN Calendar
ON Calendar.dt >= CTE_Boundaries.StartDate
AND Calendar.dt <= CTE_Boundaries.EndDate
WHERE
Calendar.IsWeekend = 1
)
,CTE_AllDates
AS
(
SELECT
EmployeeID
,AbsenceDate
,AbsenceDays
FROM @T AS T
UNION ALL
SELECT
EmployeeID
,AbsenceDate
,0 AS AbsenceDays
FROM
CTE_Weekends
)
,CTE_Data
AS
(
SELECT
EmployeeID
,AbsenceDate
,SUM(AbsenceDays) AS AbsenceDays
FROM CTE_AllDates
GROUP BY
EmployeeID
,AbsenceDate
)
-- apply gaps and islands to CTE_Data
,CTE_RowNumbers
AS
(
SELECT
EmployeeID
,AbsenceDate
,AbsenceDays
,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY AbsenceDate) AS rn1
,DATEDIFF(day, '2020-01-01', AbsenceDate) AS rn2
FROM
CTE_Data
)
SELECT
EmployeeID
,MIN(CASE WHEN AbsenceDays > 0 THEN AbsenceDate END) AS StartAbsenceDate
,MAX(CASE WHEN AbsenceDays > 0 THEN AbsenceDate END) AS EndAbsenceDate
,SUM(AbsenceDays) AS NoOfDays
FROM
CTE_RowNumbers
GROUP BY
EmployeeID
,rn2 - rn1
HAVING
SUM(AbsenceDays) > 0
ORDER BY
EmployeeID
,StartAbsenceDate
;
当范围的第一个或最后一个AbsenceDate
是星期一或星期五时,我们需要使用AbsenceDate
。如果没有这一检查,周末的相邻两天可能会附加到最后的范围内。
结果
+------------+------------------+----------------+----------+
| EmployeeID | StartAbsenceDate | EndAbsenceDate | NoOfDays |
+------------+------------------+----------------+----------+
| 1 | 2020-05-22 | 2020-05-25 | 2.5 |
| 1 | 2020-06-18 | 2020-06-25 | 6 |
| 2 | 2020-06-18 | 2020-06-25 | 3 |
+------------+------------------+----------------+----------+
发布于 2020-06-25 10:55:33
你的数据看上去不对。每天有多行。我猜这是不允许的,他们可能是不同的雇员。
要解决周末问题,可以使用lag()
、累积和和以及一些日期算法:
select EmployeeId, min(AbsenceDate), max(AbsenceDate), sum(AbsenceDays)
from (select t.*,
sum(case when datename(weekday, AbsenceDate) in ('Tuesday', 'Wednesday', 'Thursday', 'Friday') and prev_ad = dateadd(day, -1, AbsenceDate)
then 0
when datename(weekday, AbsenceDate) in ('Monday') and prev_ad = dateadd(day, -3, AbsenceDate)
then 0
else 1
end) over (partition by EmployeeId order by AbsenceDate) as grp
from (select t.*,
lag(AbsenceDate) over (partition by EmployeeId order by AbsenceDate) as prev_ad
from t
) t
) t
group by EmployeeId, grp;
这里是db<>fiddle。根据样本数据,结果看起来是正确的,但与您的问题不同。
https://stackoverflow.com/questions/62572784
复制相似问题