首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >是否有办法解决日期范围内的空隙和岛屿问题,而忽略周末?

是否有办法解决日期范围内的空隙和岛屿问题,而忽略周末?
EN

Stack Overflow用户
提问于 2020-06-25 10:02:15
回答 2查看 314关注 0票数 1

我的客户有一个出勤系统,它(大约)在此表单中存储缺勤数据(换句话说,在一天或半天之前):

代码语言:javascript
运行
复制
EmployeeID   AbsenceDate   AbsenceDays
1            2020-06-25    1
1            2020-06-24    1
1            2020-06-23    1
1            2020-06-22    1
1            2020-06-19    1
1            2020-06-18    1
1            2020-05-25    1
1            2020-06-23    1
1            2020-06-22    0.5

我构建了一份报告,输出“原样”数据,但客户已经询问是否可以采用这种形式(将连续的相关天数汇总成一个包含sum的范围):

代码语言:javascript
运行
复制
EmployeeID   StartDate   EndDate       NoOfDays
1            2020-06-18  2020-06-25    6
1            2020-05-22  2020-06-25    2.5

我已经研究了空隙和孤岛解决方案,但困难的是,对于这两种解决方案,都有一个中间的周末,在这个周末,缺勤数据应该被计算在内,而不是,而不是。是否有任何方法可以使用标准SQL (而不是使用游标或其他ROBAR解决方案,因为显而易见的原因,我宁愿避免)。

EN

回答 2

Stack Overflow用户

发布于 2020-06-25 12:52:54

首先,这种分组可以相对容易地在客户端使用您的经典编程语言,而不是SQL。但是如果你坚持..。

我已经研究了空隙和孤岛解决方案,但困难的是,对于这两种解决方案,都有一个中间的周末,在这个周末,缺勤数据应该被计算在内,而不是,而不是

其主要思想是为所有周末生成缺失的行,AbsenceDays值为0,这样,当周末结束时,间隙和岛屿就不会创建额外的范围。

为此,我将使用日历表(一个包含所有日期和各种标志(如IsWeekend)的表)。

请注意,即使周末有一些AbsenceDates,这种方法也会返回正确的结果。

样本数据

我已经调整了你的样本数据,使它更有趣和明确。(您的示例为同一个EmployeeID列出了两次相同的日期)

代码语言:javascript
运行
复制
DECLARE @T TABLE (EmployeeID int, AbsenceDate date, AbsenceDays float);

INSERT INTO @T
VALUES
(2, '2020-06-25', 0.5),
(2, '2020-06-24', 0.5),
(2, '2020-06-23', 0.5),
(2, '2020-06-22', 0.5),
(2, '2020-06-19', 0.5),
(2, '2020-06-18', 0.5),
-- here we go across the weekend and both Sat and Sun are skipped

(1, '2020-06-25', 1),
(1, '2020-06-24', 1),
(1, '2020-06-23', 1),
(1, '2020-06-22', 1),
(1, '2020-06-19', 1),
(1, '2020-06-18', 1),
-- here we go across the weekend and both Sat and Sun are skipped

(1, '2020-05-25', 1),
(1, '2020-05-23', 1),
(1, '2020-05-22', 0.5);
-- here we go across the weekend and only Sun is skipped

查询

此查询对所有日期使用Calendar表和dt,并使用标志IsWeekend

CTE_Boundaries计算每个员工从日历中需要的日期范围。CTE_Weekends给我们每个星期六和太阳一排。最后,我们将源表和日历中的日期组合在一起。

代码语言:javascript
运行
复制
WITH
CTE_Boundaries
AS
(
    SELECT
        EmployeeID
        ,MIN(AbsenceDate) AS StartDate
        ,MAX(AbsenceDate) AS EndDate
    FROM
        @T AS T
    GROUP BY
        EmployeeID
)
,CTE_Weekends
AS
(
    SELECT
        CTE_Boundaries.EmployeeID
        ,Calendar.dt AS AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Boundaries
        INNER JOIN Calendar
            ON  Calendar.dt >= CTE_Boundaries.StartDate
            AND Calendar.dt <= CTE_Boundaries.EndDate
    WHERE
        Calendar.IsWeekend = 1
)
,CTE_AllDates
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,AbsenceDays
    FROM @T AS T

    UNION ALL

    SELECT
        EmployeeID
        ,AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Weekends
)
SELECT
    EmployeeID
    ,AbsenceDate
    ,SUM(AbsenceDays) AS AbsenceDays
FROM CTE_AllDates
GROUP BY
    EmployeeID
    ,AbsenceDate
;

结果

代码语言:javascript
运行
复制
+------------+-------------+-------------+
| EmployeeID | AbsenceDate | AbsenceDays |
+------------+-------------+-------------+
|          1 | 2020-05-22  |         0.5 |
|          1 | 2020-05-23  |           1 |
|          1 | 2020-05-24  |           0 |
|          1 | 2020-05-25  |           1 |
|          1 | 2020-05-30  |           0 |
|          1 | 2020-05-31  |           0 |
|          1 | 2020-06-06  |           0 |
|          1 | 2020-06-07  |           0 |
|          1 | 2020-06-13  |           0 |
|          1 | 2020-06-14  |           0 |
|          1 | 2020-06-18  |           1 |
|          1 | 2020-06-19  |           1 |
|          1 | 2020-06-20  |           0 |
|          1 | 2020-06-21  |           0 |
|          1 | 2020-06-22  |           1 |
|          1 | 2020-06-23  |           1 |
|          1 | 2020-06-24  |           1 |
|          1 | 2020-06-25  |           1 |
|          2 | 2020-06-18  |         0.5 |
|          2 | 2020-06-19  |         0.5 |
|          2 | 2020-06-20  |           0 |
|          2 | 2020-06-21  |           0 |
|          2 | 2020-06-22  |         0.5 |
|          2 | 2020-06-23  |         0.5 |
|          2 | 2020-06-24  |         0.5 |
|          2 | 2020-06-25  |         0.5 |
+------------+-------------+-------------+

现在,您可以将空白和孤岛应用到这个数据集中,您将得到一个日期、2020-05-22 - 2020-05-252020-06-18 - 2020-06-25组。你也会得到每个周末的组,但是那些单独的周末的AbsenceDays之和是零,所以我们可以过滤掉它们。

在这里,我使用ROW_NUMBER来解决空白和孤岛问题:

最终查询

代码语言:javascript
运行
复制
WITH
CTE_Boundaries
AS
(
    SELECT
        EmployeeID
        ,MIN(AbsenceDate) AS StartDate
        ,MAX(AbsenceDate) AS EndDate
    FROM
        @T AS T
    GROUP BY
        EmployeeID
)
,CTE_Weekends
AS
(
    SELECT
        CTE_Boundaries.EmployeeID
        ,Calendar.dt AS AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Boundaries
        INNER JOIN Calendar
            ON  Calendar.dt >= CTE_Boundaries.StartDate
            AND Calendar.dt <= CTE_Boundaries.EndDate
    WHERE
        Calendar.IsWeekend = 1
)
,CTE_AllDates
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,AbsenceDays
    FROM @T AS T

    UNION ALL

    SELECT
        EmployeeID
        ,AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Weekends
)
,CTE_Data
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,SUM(AbsenceDays) AS AbsenceDays
    FROM CTE_AllDates
    GROUP BY
        EmployeeID
        ,AbsenceDate
)

-- apply gaps and islands to CTE_Data
,CTE_RowNumbers
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,AbsenceDays
        ,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY AbsenceDate) AS rn1
        ,DATEDIFF(day, '2020-01-01', AbsenceDate) AS rn2
    FROM
        CTE_Data
)
SELECT
    EmployeeID
    ,MIN(CASE WHEN AbsenceDays > 0 THEN AbsenceDate END) AS StartAbsenceDate
    ,MAX(CASE WHEN AbsenceDays > 0 THEN AbsenceDate END) AS EndAbsenceDate
    ,SUM(AbsenceDays) AS NoOfDays
FROM
    CTE_RowNumbers
GROUP BY
    EmployeeID
    ,rn2 - rn1
HAVING
    SUM(AbsenceDays) > 0
ORDER BY
    EmployeeID
    ,StartAbsenceDate
;

当范围的第一个或最后一个AbsenceDate是星期一或星期五时,我们需要使用AbsenceDate。如果没有这一检查,周末的相邻两天可能会附加到最后的范围内。

结果

代码语言:javascript
运行
复制
+------------+------------------+----------------+----------+
| EmployeeID | StartAbsenceDate | EndAbsenceDate | NoOfDays |
+------------+------------------+----------------+----------+
|          1 | 2020-05-22       | 2020-05-25     |      2.5 |
|          1 | 2020-06-18       | 2020-06-25     |        6 |
|          2 | 2020-06-18       | 2020-06-25     |        3 |
+------------+------------------+----------------+----------+
票数 1
EN

Stack Overflow用户

发布于 2020-06-25 10:55:33

你的数据看上去不对。每天有多行。我猜这是不允许的,他们可能是不同的雇员。

要解决周末问题,可以使用lag()、累积和和以及一些日期算法:

代码语言:javascript
运行
复制
select EmployeeId, min(AbsenceDate), max(AbsenceDate), sum(AbsenceDays)
from (select t.*,
             sum(case when datename(weekday, AbsenceDate) in ('Tuesday', 'Wednesday', 'Thursday', 'Friday') and prev_ad = dateadd(day, -1, AbsenceDate)
                      then 0
                      when datename(weekday, AbsenceDate) in ('Monday') and prev_ad = dateadd(day, -3, AbsenceDate)
                      then 0
                      else 1
                 end) over (partition by EmployeeId order by AbsenceDate) as grp
      from (select t.*,
                   lag(AbsenceDate) over (partition by EmployeeId order by AbsenceDate) as prev_ad
            from t
           ) t
     ) t
group by EmployeeId, grp;

这里是db<>fiddle。根据样本数据,结果看起来是正确的,但与您的问题不同。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62572784

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档