文章/答案/技术大牛

发布

问如何计算就业年限
EN

Stack Overflow用户

提问于 2021-08-12 08:22:23

回答 2查看 150关注 0票数 0

我有关于人们的就业状况的数据，每月一年。数据包括4个变量:一个人的ID、国家、月份和该月份的主要活动(就业、失业、不活动、其他)。我举了一个例子：

ID <- c(1:10, 1:10)
country <- c("AT", "BE", "CH", "CZ", "HR", "SO", "SV", "RU", "GR", "GE", "AT", "BE", "CH", "CZ", "HR", 
             "SO", "SV", "RU", "GR", "GE")
month <- c("Jan", "Feb", "Mar", "Apr", "May", "Aug", "Dec", "Nov", "Sep", "Jan", "Jun", "Jul", "Oct",
           "Jan", "Feb", "Mar", "Apr", "May", "Aug", "Dec")
act <- c("Unemployed", "Employed", "Other", "Other", "Inactive", "Unemployed", "Employed", 
         "Employed", "Employed", "Unemployed", "Other", "Unemployed", "Unemployed", "Unemployed", 
         "Other", "Other", "Employed", "Other", "Other", "NA")
df <- data.frame(ID, country, month, act)
df[order(ID),]

   ID country month        act
1   1      AT   Jan Unemployed
11  1      AT   Jun      Other
21  1      AT   Nov Unemployed
2   2      BE   Feb   Employed
12  2      BE   Jul Unemployed
22  2      BE   Sep Unemployed
3   3      CH   Mar      Other
13  3      CH   Oct Unemployed
23  3      CH   Jan         NA
4   4      CZ   Apr      Other
14  4      CZ   Jan Unemployed
24  4      CZ   Jun Unemployed
5   5      HR   May   Inactive
15  5      HR   Feb      Other
25  5      HR   Jul      Other
6   6      SO   Aug Unemployed
16  6      SO   Mar      Other
26  6      SO   Oct   Employed
7   7      SV   Dec   Employed
17  7      SV   Apr   Employed
27  7      SV   Nov   Employed
8   8      RU   Nov   Employed
18  8      RU   May      Other
28  8      RU   Jan         NA
9   9      GR   Sep   Employed
19  9      GR   Aug      Other
29  9      GR   Jun   Inactive
10 10      GE   Jan Unemployed
20 10      GE   Dec         NA
30 10      GE   Aug Unemployed

我的目标是创建一个新的dataframe，其中每一行代表一个雇用期()，但条件是在就业之前和之后必须有一段失业时间()。这样我就能只包括那些人们从失业转到就业和回到失业状态的就业时间，并计算这些时间的持续时间。理想情况下，最终会有4个变量: PersID、国家、咒语持续时间、开始月份、月底。它应该是这样的：

   ID country spell_duration starting ending
1   1      AT              5      Jan    May
11  1      AT              5      Jun    Oct
2   2      BE              7      Feb    Aug
12  2      BE              6      Jul    Dec
3   3      CH             10      Mar    Dec
13  3      CH              1      Oct    Oct
4   4      CZ              8      Apr    Nov
14  4      CZ              5      Jan    May
5   5      HR              5      May    Sep
15  5      HR              4      Feb    May
6   6      SO              2      Aug    Sep
16  6      SO              6      Mar    Aug
7   7      SV              1      Dec    Dec
17  7      SV              9      Apr    Dec
8   8      RU              8      Nov    Dec
18  8      RU              7      May    Nov
9   9      GR              3      Sep    Nov
19  9      GR              2      Aug    Sep
10 10      GE              8      Jan    Aug
20 10      GE              1      Dec    Dec

我已经找到了玛丽亚(How to calculate number and duration of categorical spells by ID in R)的解决方案，但她的问题是不同的。我不想要整个工作的时间，我也不需要有多少时间

回答 2

Stack Overflow用户

发布于 2021-08-12 09:06:09

没有经过太多的思考，我脑海中浮现的第一件事。不过，很累赘。我相信有更优雅的解决方案，但这不需要任何额外的包。

data <- df
Empl_spells <- data.frame(ID = c(), Start = c(), End = c())

for(user in unique(data$ID)){
  # subset per user
  user_dat <- data[data$ID == user,]
  # initiate a list to store where changes occur and a counter for
  # entries to this list
  if(nrow(user_dat) > 2){
    Changes_data <- list()
    entry <- 1
    # for every row, check if it switches from employed to unemployed
    # or the opposite. Mark with "break" if some other entry interrupts
    for(i in 2:nrow(user_dat)){
      if(user_dat$act[i] == "Employed" &
         user_dat$act[i-1] == "Unemployed"){
        Changes_data[[entry]] <- c("Start", i)
        entry <- entry + 1
      }else if(user_dat$act[i] == "Unemployed" &
               user_dat$act[i-1] == "Employed"){
        Changes_data[[entry]] <- c("End", i)
        entry <- entry + 1
      }else if(user_dat$act[i] != "Employed" &
               user_dat$act[i] != "Unemployed"){
        Changes_data[[entry]] <- c("Break", i)
        entry <- entry + 1
      }
    }
    # see where to an "End" follows a "Start" immediately in the new list
    Changes_df <- do.call(rbind.data.frame, Changes_data)
    EmplToUnempl <- which(Changes_df[-nrow(Changes_df), 1] == "Start" & Changes_df[-1, 1] == "End")
    if(length(EmplToUnempl) >= 1){
      append <- data.frame(ID = user,
                           Start = user_dat$month[as.numeric(Changes_df[EmplToUnempl, 2])],
                           End = user_dat$month[as.numeric(Changes_df[EmplToUnempl + 1, 2])-1])
      # append the data to the data.frame for all of the people
      Empl_spells <- rbind(Empl_spells, append)
    }
  }
}

因为我没有你的数据，所以我没有测试这个。这是你想要的吗？

编辑(矢量化；可能使其更快)：

data <- df

users <- unique(data$ID)
calculate <- function(user){
  # subset per user
  user_dat <- data[data$ID == user,]
  # initiate a list to store where changes occur and a counter for
  # entries to this list
  if(nrow(user_dat) > 2){
    Changes_data <- list()
    entry <- 1
    # for every row, check if it switches from employed to unemployed
    # or the opposite. Mark with "break" if some other entry interrupts
    for(i in 2:nrow(user_dat)){
      if(user_dat$act[i] == "Employed" &
         user_dat$act[i-1] == "Unemployed"){
        Changes_data[[entry]] <- c("Start", i)
        entry <- entry + 1
      }else if(user_dat$act[i] == "Unemployed" &
               user_dat$act[i-1] == "Employed"){
        Changes_data[[entry]] <- c("End", i)
        entry <- entry + 1
      }else if(user_dat$act[i] != "Employed" &
               user_dat$act[i] != "Unemployed"){
        Changes_data[[entry]] <- c("Break", i)
        entry <- entry + 1
      }
    }
    # see where to an "End" follows a "Start" immediately in the new list
    Changes_df <- do.call(rbind.data.frame, Changes_data)
    EmplToUnempl <- which(Changes_df[-nrow(Changes_df), 1] == "Start" & Changes_df[-1, 1] == "End")
    if(length(EmplToUnempl) >= 1){
      append <- data.frame(ID = user,
                           Start = user_dat$month[as.numeric(Changes_df[EmplToUnempl, 2])],
                           End = user_dat$month[as.numeric(Changes_df[EmplToUnempl + 1, 2])-1])
      # append the data to the data.frame for all of the people
      return(append)
    }
  }
}

empl_spells <- lapply(users, FUN = calculate)
Empl_spells <- do.call(rbind.data.frame, empl_spells)

编辑#2 (计算持续时间)：

MonthToNumeric <- function(x){
  which(c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec") == x)
}

calcDuration <- function(Start, End){
  return(MonthToNumeric(End) - MonthToNumeric(Start) + 1)
}

Empl_spells$Duration <- mapply(FUN = calcDuration, Start = Empl_spells[, 2], End = Empl_spells[, 3])

票数 0

Stack Overflow用户

发布于 2021-08-12 09:27:27

我使用data.table软件包进行管理，我猜循环也能工作。

编辑:左一个额外的"}“，我编辑了它。我试过了而且成功了。

EDIT2:我也添加了"setDT(df)“。

library(data.table)

 df <- fread(paste("ID country month        act
    1      AT   Jan Unemployed
    1      AT   Jun      Other
    1      AT   Nov Unemployed
    2      BE   Feb   Employed
    2      BE   Jul Unemployed
    2      BE   Sep Unemployed
    3      CH   Mar      Other
    3      CH   Oct Unemployed
    3      CH   Jan         NA
    4      CZ   Apr      Other
    4      CZ   Jan Unemployed
    4      CZ   Jun Unemployed
    5      HR   May   Inactive
    5      HR   Feb      Other
    5      HR   Jul      Other
    6      SO   Aug Unemployed
    6      SO   Mar      Other
    6      SO   Oct   Employed
    7      SV   Dec   Employed
    7      SV   Apr   Employed
    7      SV   Nov   Employed
    8      RU   Nov   Employed
    8      RU   May      Other
    8      RU   Jan         NA
    9      GR   Sep   Employed
    9      GR   Aug      Other
    9      GR   Jun   Inactive
    10      GE   Jan Unemployed
    10      GE   Dec         NA
    10      GE   Aug Unemployed", collapse = '\n'))
setDT(df)
df[, monthInt := match(month, month.abb)]

df <- df[order(ID,monthInt)]

finalDt <- data.table()
for (i in unique(df[, ID])) {
  
  tempT <- df[ID == i]
  
  for (tim in 1:(nrow(tempT)-1)) {
    timT <- data.table(ID = tempT[tim,ID],
                       country = tempT[tim, country],
                       spell_duration = tempT[tim+1, monthInt] - tempT[tim, monthInt],
                       starting = month.abb[tempT[tim, monthInt]],
                       ending = month.abb[tempT[tim+1, monthInt]-1])
    
    finalDt <- rbind(finalDt,timT)
    
  }
}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68753941

复制

相似问题

问如何计算就业年限
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何计算就业年限EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何计算就业年限
EN