前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >三一重工大数据面试SQL-部门人员数据分析

三一重工大数据面试SQL-部门人员数据分析

作者头像
数据仓库晨曦
发布2024-04-30 12:18:32
760
发布2024-04-30 12:18:32
举报
文章被收录于专栏:数据仓库技术数据仓库技术

一、题目

现有一张员工在职所在部门信息表,包含员工ID、所属部门、开始日期、结束日期,请查询出如下内容

1.2024年1月31日A部门在职员工数;

2.2024年1月份A部门员工最多时有多少员工;

3.2024年1月份A部门平均有多少员工;

代码语言:javascript
复制
+--------------+-------------+-------------+-------------+
| employee_id  | department  | start_date  |  end_date   |
+--------------+-------------+-------------+-------------+
| 1            | A           | 2023-12-20  | 2024-01-22  |
| 2            | A           | 2024-01-02  | 2024-01-11  |
| 2            | B           | 2024-01-11  | 2024-01-25  |
| 2            | A           | 2023-01-25  | 9999-01-01  |
| 3            | A           | 2023-12-20  | 9999-01-01  |
| 4            | A           | 2024-02-02  | 9999-01-01  |
| 5            | A           | 2023-06-20  | 2023-12-22  |
+--------------+-------------+-------------+-------------+

二、分析

  1. 题目本身是一个拉链表的结构,可以认为是一个左闭又开的数据。即开始日期算做在部门内,离开日期不算在部门日期。
  2. 第1问:查询时点数据,我们可以根据记录中的开始日期和结束日期与时间判断,如果时点在区间内,则代表用户在该部门;
  3. 第2问:2024年1月份A部门员工最多时有多少员工,调整表结构为进入离开部门的记录表,并进行计数,进入部门+1,离开部门-1,然后对所有行为进行累积求和,取出出现在1月份的最大值即可;
  4. 第3问:2024年1月份A部门平均有多少员工,存在两种计算方式:1.计算出A部门1月份每天员工数,然后进行求和。2.计算出A部门在1月份的总人 * 天 然后除以1月份天数(31天)。其中计算总人 * 天数可以使用第2问的过程数据,即每个人数状态 * 持续天数来计算。

维度

评分

题目难度

⭐️⭐️⭐️⭐️⭐️

题目清晰度

⭐️⭐️⭐️⭐️⭐️

业务常见度

⭐️⭐️⭐️⭐️⭐️

三、SQL

1.2024年1月31日A部门在职员工数

通过判断2024-01-31>=start_date并且2024-01-31<end_date 证明在部门内。sql如下

代码语言:javascript
复制
select count(1) as a_depart_num
from employee_department_info
where department = 'A'
and start_date <= '2024-01-31'
and end_date >'2024-01-31';

查询结果

代码语言:javascript
复制
+---------------+
| a_depart_num  |
+---------------+
| 2             |
+---------------+

2.2024年1月份A部门员工最多时有多少员工;

2.1 生成员工进入离开部门表

生成员工加入离开部门表,表内包含employee_id, department, enter_or_leave,action_date。其中enter_type 1代表进入,-1代表离开,进入时间用start_date,离开时间用end_date。通过查询2遍员工部门表,并通过union all 来整合到一起。

代码语言:javascript
复制
--加入部门记录
select
  employee_id as employee_id,
  department as department,
  1 as enter_or_leave,
  start_date as action_date
from employee_department_info
where department = 'A'
union all
-- 离开部门记录
select
  employee_id,
  department,
  -1 as enter_or_leave,
  end_date as action_date
from employee_department_info
where department = 'A'

查询结果

代码语言:javascript
复制
+------------------+-----------------+---------------------+------------------+
| _u1.employee_id  | _u1.department  | _u1.enter_or_leave  | _u1.action_date  |
+------------------+-----------------+---------------------+------------------+
| 1                | A               | 1                   | 2023-12-20       |
| 1                | A               | -1                  | 2024-01-22       |
| 2                | A               | 1                   | 2024-01-02       |
| 2                | A               | -1                  | 2024-01-11       |
| 2                | A               | 1                   | 2023-01-25       |
| 2                | A               | -1                  | 9999-01-01       |
| 3                | A               | 1                   | 2023-12-20       |
| 3                | A               | -1                  | 9999-01-01       |
| 4                | A               | 1                   | 2024-02-02       |
| 4                | A               | -1                  | 9999-01-01       |
| 5                | A               | 1                   | 2023-06-20       |
| 5                | A               | -1                  | 2023-12-22       |
+------------------+-----------------+---------------------+------------------+

2.2 添加一条无状态数据,保证1月份有记录,增加两条A部门月初月末无人员变动记录

因为后面使用数据累积,也就是只有在数据变化的时候的才有记录,所以我们增加两条无人员变动记录,employee_id = 0 ,enter_or_leave = 0 代表该用户既不是进入,也不是离开。 时间分别是月初和月末。这样即能保证1月份肯定有数据,也能保证有1月份的初始和结束状态。

代码语言:javascript
复制
--加入部门记录
select
  employee_id,
  department,
  1 as enter_or_leave,
  start_date as action_date
from employee_department_info
where department = 'A'
union all
-- 离开部门记录
select
  employee_id,
  department,
  -1 as enter_or_leave,
  end_date as action_date
from employee_department_info
where department = 'A'
--月初记录
union all 
select 
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-01' as action_date
union all 
--月末记录
select 
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-31' as action_date

2.3 使用累加方式计算每次变动之后A部门的人数

3)使用sum()over(order by **) 的方式,对A部门的每次人数变化进行累积求和

代码语言:javascript
复制
with t as (
--加入部门记录
select
  employee_id,
  department,
  1 as enter_or_leave,
  start_date as action_date
from employee_department_info
where department = 'A'
union all
-- 离开部门记录
select
  employee_id,
  department,
  -1 as enter_or_leave,
  end_date as action_date
from employee_department_info
where department = 'A'
--月初记录
union all 
select 
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-01' as action_date
union all 
--月末记录
select 
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-31' as action_date
)
select 
employee_id,
department,
enter_or_leave,
action_date,
sum(enter_or_leave order by action_date asc) as depart_emp_cnt
from t

查询结果

代码语言:javascript
复制
+------------------+-----------------+---------------------+------------------+
| _u1.employee_id  | _u1.department  | _u1.enter_or_leave  | _u1.action_date  |
+------------------+-----------------+---------------------+------------------+
| 1                | A               | 1                   | 2023-12-20       |
| 1                | A               | -1                  | 2024-01-22       |
| 2                | A               | 1                   | 2024-01-02       |
| 2                | A               | -1                  | 2024-01-11       |
| 2                | A               | 1                   | 2023-01-25       |
| 2                | A               | -1                  | 9999-01-01       |
| 3                | A               | 1                   | 2023-12-20       |
| 3                | A               | -1                  | 9999-01-01       |
| 4                | A               | 1                   | 2024-02-02       |
| 4                | A               | -1                  | 9999-01-01       |
| 5                | A               | 1                   | 2023-06-20       |
| 5                | A               | -1                  | 2023-12-22       |
| 0                | A               | 0                   | 2024-01-01       |
| 0                | A               | 0                   | 2024-01-31       |
+------------------+-----------------+---------------------+------------------+

2.4 时间段限定在1月份,对depart_emp_cnt 求最大值。

代码语言:javascript
复制
with t as (
--加入部门记录
    select employee_id,
           department,
           1          as enter_or_leave,
           start_date as action_date
    from employee_department_info
    where department = 'A'
    union all
-- 离开部门记录
    select employee_id,
           department,
           -1       as enter_or_leave,
           end_date as action_date
    from employee_department_info
    where department = 'A'
--月初记录
    union all
    select 0            as employee_id,
           'A'          as department,
           0            as enter_or_leave,
           '2024-01-01' as action_date
    union all
--月末记录
    select 0            as employee_id,
           'A'          as department,
           0            as enter_or_leave,
           '2024-01-31' as action_date
)
select max(depart_emp_cnt) as max_emp_cnt
from (
select
employee_id,
department,
enter_or_leave,
action_date,
sum(enter_or_leave)over(order by action_date asc) as depart_emp_cnt
from t) tt
where action_date >= '2024-01-01'
and action_date <= '2024-01-31'

查询结果

代码语言:javascript
复制
+--------------+
| max_emp_cnt  |
+--------------+
| 4            |
+--------------+

3.2024年1月份A部门平均有多少员工;

3.1 求每个阶段员工人数持续天数

求平均有多少员工,我们可以根据2.3的结果进行计算,查看每个阶段(两次员工变化之间的日期为同一个阶段)的人数和持续天数。然后相乘,再求和得出最终1月份在职员工的人*天,然后除以1月份天数得出平均在职人数。使用lead函数,计算出下一次变动日期,然后使用下一次变化日期-当前日期即为当前状态持续时间。这里因为在月末的时候,人数持续一天,所以我们需要在原始记录中增加2024-02-01一条无人员变动记录。

代码语言:javascript
复制
with t as (
--加入部门记录
select
  employee_id,
  department,
  1 as enter_or_leave,
  start_date as action_date
from employee_department_info
where department = 'A'
union all
-- 离开部门记录
select
  employee_id,
  department,
  -1 as enter_or_leave,
  end_date as action_date
from employee_department_info
where department = 'A'
--月初记录
union all
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-01' as action_date
union all
--月末记录
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-31' as action_date
union all
--2月初记录
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-02-01' as action_date
),
t2 as(
select
employee_id,
department,
enter_or_leave,
action_date,
sum(enter_or_leave) over( order by action_date asc) as depart_emp_cnt
from t)
select
employee_id,
department,
enter_or_leave,
action_date,
depart_emp_cnt,
datediff(lead(action_date)over(order by action_date asc),action_date) as keep_days
from t2

查询结果

代码语言:javascript
复制
+--------------+-------------+-----------------+--------------+-----------------+------------+
| employee_id  | department  | enter_or_leave  | action_date  | depart_emp_cnt  | keep_days  |
+--------------+-------------+-----------------+--------------+-----------------+------------+
| 2            | A           | 1               | 2023-01-25   | 1               | 146        |
| 5            | A           | 1               | 2023-06-20   | 2               | 183        |
| 3            | A           | 1               | 2023-12-20   | 4               | 0          |
| 1            | A           | 1               | 2023-12-20   | 4               | 2          |
| 5            | A           | -1              | 2023-12-22   | 3               | 10         |
| 0            | A           | 0               | 2024-01-01   | 3               | 1          |
| 2            | A           | 1               | 2024-01-02   | 4               | 9          |
| 2            | A           | -1              | 2024-01-11   | 3               | 11         |
| 1            | A           | -1              | 2024-01-22   | 2               | 9          |
| 0            | A           | 0               | 2024-01-31   | 2               | 1          |
| 0            | A           | 0               | 2024-02-01   | 2               | 1          |
| 4            | A           | 1               | 2024-02-02   | 3               | 2912777    |
| 4            | A           | -1              | 9999-01-01   | 0               | 0          |
| 3            | A           | -1              | 9999-01-01   | 0               | 0          |
| 2            | A           | -1              | 9999-01-01   | 0               | NULL       |
+--------------+-------------+-----------------+--------------+-----------------+------------+

3.2计算最终结果

查询1月份的日期数据,然后对depart_emp_cnt* last_date 求和,再除以31 即为1月份的平均在职人数

代码语言:javascript
复制
with t as (
--加入部门记录
select
  employee_id,
  department,
  1 as enter_or_leave,
  start_date as action_date
from employee_department_info
where department = 'A'
union all
-- 离开部门记录
select
  employee_id,
  department,
  -1 as enter_or_leave,
  end_date as action_date
from employee_department_info
where department = 'A'
--月初记录
union all
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-01' as action_date
union all
--月末记录
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-31' as action_date
union all
--2月初记录
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-02-01' as action_date
),
t2 as(
select
employee_id,
department,
enter_or_leave,
action_date,
sum(enter_or_leave) over( order by action_date asc) as depart_emp_cnt
from t),
t3 as (
select
employee_id,
department,
enter_or_leave,
action_date,
depart_emp_cnt,
datediff(lead(action_date)over(order by action_date asc),action_date) as keep_days
from t2)
select
sum(depart_emp_cnt* keep_days)/31 as avg_emp_cnt
from t3
where action_date >='2024-01-01' and action_date <= '2024-01-31'

查询结果

代码语言:javascript
复制
+--------------------+
|    avg_emp_cnt     |
+--------------------+
| 2.967741935483871  |
+--------------------+

四、建表语句和数据插入

代码语言:javascript
复制
CREATE TABLE IF NOT EXISTS employee_department_info (
    employee_id INT, -- 员工ID
    department STRING, -- 所属部门
    start_date STRING, -- 开始日期
    end_date STRING -- 结束日期
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','  -- 假设字段使用逗号分隔
STORED AS ORC;

insert into employee_department_info(employee_id, department, start_date,end_date) values 
(1, 'A', '2023-12-20','2024-01-22'),
(2, 'A', '2024-01-02','2024-01-11'),
(2, 'B', '2024-01-11','2024-01-25'),
(2, 'A', '2023-01-25','9999-01-01'),
(3, 'A', '2023-12-20','9999-01-01'),
(4, 'A', '2024-02-02','9999-01-01'),
(5, 'A', '2023-06-20','2023-12-22');
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2024-04-29,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 数据仓库技术 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一、题目
  • 二、分析
  • 三、SQL
    • 1.2024年1月31日A部门在职员工数
      • 2.2024年1月份A部门员工最多时有多少员工;
        • 2.1 生成员工进入离开部门表
        • 2.2 添加一条无状态数据,保证1月份有记录,增加两条A部门月初月末无人员变动记录
        • 2.3 使用累加方式计算每次变动之后A部门的人数
        • 2.4 时间段限定在1月份,对depart_emp_cnt 求最大值。
      • 3.2024年1月份A部门平均有多少员工;
        • 3.1 求每个阶段员工人数持续天数
        • 3.2计算最终结果
    • 四、建表语句和数据插入
    相关产品与服务
    大数据
    全栈大数据产品,面向海量数据场景,帮助您 “智理无数,心中有数”!
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档