文章/答案/技术大牛

发布

社区首页 >问答首页 >在行特定范围内查找具有不同级别限制的行集。

问在行特定范围内查找具有不同级别限制的行集。
EN

Stack Overflow用户

提问于 2022-08-16 19:25:50

回答 2查看 89关注 0票数 0

我有一个数据集，每一行都是由住院医生身份识别的。每一行还载有关于入院和出院日期及发生地点的资料。住院可能涉及多名医生。医生可以在多家医院工作。

我有另一个数据集，其中包含每个医生的专业信息(例如，临床医生、心脏病专家)。一位医生可能有多种专业。

我想知道，每一位住院医生排在同一专科的其他内科医生在开始住院前30天完成的所有其他住院病人的身份。

在一定程度上使用了R (dplyr): find all rows in row-specific range *WITH RESTRICTION*中的解决方案，我成功地编写了一个代码，在开始住院前的30天内，在同一家医院的其他医生所做的所有住院治疗。对于每一行，我首先找到了在30天时间内在给定医院的所有住院病人的列表。然后，我找到了一份清单，其中只包括有自我医生参与的住院情况。最后，我在第一个列表中选择了不在第二个列表中的元素。

我想调整代码，以找到其他医生住院，谁至少有一个专业与自我医生。理想情况下，我想改变上面代码的第一步，以找到在自我医生的专业范围内的所有住院病人的列表。然后，我可以使用代码的其余部分，因为它是从这个列表中减去涉及自我医生的住院治疗。这里的主要困难是，一个医生可能有多个专业--否则，只需要在过滤器功能中包含另一个变量。

下面是我现在的代码-它没有考虑到自我医生的专业。

df <- data.frame(hospitalization_id = c(1, 2, 3,
                                        1, 2, 3,
                                        4, 5, 
                                        6, 7, 8),
                 hospital_id = c("A", "A", "A", 
                                 "A", "A", "A", 
                                 "A", "A",
                                 "B", "B", "B"),
                 physician_id = c(1, 1, 1, 
                                  2, 2, 2,
                                  3, 3, 
                                  2, 2, 2),
                 date_start = as.Date(c("2000-01-01", "2000-01-12", "2000-01-20",
                                        "2000-01-01", "2000-01-12", "2000-01-20",
                                        "2000-01-12", "2000-01-20",
                                        "2000-02-10", "2000-02-11", "2000-02-12")),
                 date_end = as.Date(c("2000-01-03", "2000-01-18", "2000-01-22",
                                      "2000-01-03", "2000-01-18", "2000-01-22",
                                      "2000-01-18", "2000-01-22",
                                      "2000-02-11", "2000-02-14", "2000-02-17")))

df2 <- df %>%
  mutate(
    # Generates 30-day time interval before start of given hospitalization 
    date_range1 = date_start - 30,
    date_range2 = date_start - 1,
    # List of all hospitalizations in given hospital, in time interval
    hospid_all = pmap(list(date_range1, date_range2, hospital_id),
                      function(x, y, z) filter(df,
                                               date_end >= x & date_end <= y,
                                               hospital_id == z)$hospitalization_id),
    hospid_all = lapply(hospid_all, unique),
    # List of ego's hospitalizations in given hospital, in time interval
    hospid_ego = pmap(list(date_range1, date_range2, hospital_id, physician_id),
                      function(x, y, z, p) filter(df,
                                                  date_end >= x & date_end <= y,
                                                  hospital_id == z,
                                                  physician_id == p)$hospitalization_id),
    # List of peers' hospitalizations in given hospital, in time interval
    hospid_peer = future_map2(hospid_all, hospid_ego, ~ .x[!(.x %in% .y)])) %>%
  select(-starts_with('date_'), -hospid_all, -hospid_ego) %>% # only keep peers' list of hospitalization
  rename('ego'='physician_id')

df3 <- df2 %>%
  select(hospitalization_id, hospital_id, ego, hospid_peer) %>%
  unnest(hospid_peer, keep_empty = TRUE)

df4 <- df3 %>%
  left_join(select(df, hospitalization_id, physician_id), 
            by=c('hospid_peer'='hospitalization_id')) %>%
  rename(alter = physician_id)

每一位医生的专业在另一种df中被告知。在本例中，医生2与医生1和医生3共享专业，但医生1和医生3没有任何共同之处。

physician_spec <- data.frame(physician_id = c(1, 2, 2, 3),
                      specialty_code = c(100, 100, 200, 200))

dplyr

subset

date-range

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-08-16 19:56:29

您可以创建两个辅助函数，other_mds和f。第一种方法获取一个医生id，并返回那些具有匹配的专业人员的医生id。第二项是取医院身份证、医生身份证和开始日期(即df中某一行的开始日期)，并返回在前30天内结束的住院病人名单，这些病人在同一医院，由一名专科医生进行。

other_mds <- function(pid) {
  physician_spec[
    physician_id!=pid & specialty_code %in% physician_spec[physician_id==pid, specialty_code],
    physician_id]
}

f <- function(hid, pid, s) {
  other_phys = other_mds(pid)
  exclude_hosps = df[physician_id == pid, unique(hospitalization_id)]
  df[hospital_id == hid & 
       physician_id %in% other_phys &
       s>date_end &
       (s-date_end)<30 &
       !hospitalization_id %in% exclude_hosps,
     paste0(hospitalization_id, collapse=",")]
}

现在，我们只对每一行应用函数f。

library(data.table)
setDT(df)
setDT(physician_spec)
df[, matches:=f(hospital_id, physician_id,date_start), 1:nrow(df)]

输出：

    hospitalization_id hospital_id physician_id date_start   date_end matches
                 <num>      <char>        <num>     <Date>     <Date>  <char>
 1:                  1           A            1 2000-01-01 2000-01-03        
 2:                  2           A            1 2000-01-12 2000-01-18        
 3:                  3           A            1 2000-01-20 2000-01-22        
 4:                  1           A            2 2000-01-01 2000-01-03        
 5:                  2           A            2 2000-01-12 2000-01-18        
 6:                  3           A            2 2000-01-20 2000-01-22       4
 7:                  4           A            3 2000-01-12 2000-01-18       1
 8:                  5           A            3 2000-01-20 2000-01-22     1,2
 9:                  6           B            2 2000-02-10 2000-02-11        
10:                  7           B            2 2000-02-11 2000-02-14        
11:                  8           B            2 2000-02-12 2000-02-17

更新-返回匹配向量，然后合并：

f使其返回向量

f <- function(hid, pid, s) {
  other_phys = other_mds(pid)
  exclude_hosps = df[physician_id == pid, unique(hospitalization_id)]
  df[hospital_id == hid & 
       physician_id %in% other_phys &
       s>date_end &
       (s-date_end)<30 &
       !hospitalization_id %in% exclude_hosps]$hospitalization_id
}

现在，当我们运行这个函数时，

通过hospitalization_id和physician_id实现了这一点，因此它返回了一个三列data.table (列是by列和名为match的新列。然后将其合并到原始df

上。

df[, .(match = f(hospital_id, physician_id,date_start)), .(hospitalization_id, physician_id)][
  df, 
  on=.(hospitalization_id,physician_id)
]

输出：

    hospitalization_id physician_id match hospital_id date_start   date_end
                 <num>        <num> <num>      <char>     <Date>     <Date>
 1:                  1            1    NA           A 2000-01-01 2000-01-03
 2:                  2            1    NA           A 2000-01-12 2000-01-18
 3:                  3            1    NA           A 2000-01-20 2000-01-22
 4:                  1            2    NA           A 2000-01-01 2000-01-03
 5:                  2            2    NA           A 2000-01-12 2000-01-18
 6:                  3            2     4           A 2000-01-20 2000-01-22
 7:                  4            3     1           A 2000-01-12 2000-01-18
 8:                  5            3     1           A 2000-01-20 2000-01-22
 9:                  5            3     2           A 2000-01-20 2000-01-22
10:                  6            2    NA           B 2000-02-10 2000-02-11
11:                  7            2    NA           B 2000-02-11 2000-02-14
12:                  8            2    NA           B 2000-02-12 2000-02-17

票数 1

Stack Overflow用户

发布于 2022-08-16 19:47:19

以下是我到目前为止所掌握的内容，请尝试其他示例和其他数据集，以确定它是否是您要寻找的：

df <- data.frame(hospitalization_id = c(1, 2, 3,
                                        1, 2, 3,
                                        4, 5, 
                                        6, 7, 8),
                 hospital_id = c("A", "A", "A", 
                                 "A", "A", "A", 
                                 "A", "A",
                                 "B", "B", "B"),
                 physician_id = c(1, 1, 1, 
                                  2, 2, 2,
                                  3, 3, 
                                  2, 2, 2),
                 date_start = as.Date(c("2000-01-01", "2000-01-12", "2000-01-20",
                                        "2000-01-01", "2000-01-12", "2000-01-20",
                                        "2000-01-12", "2000-01-20",
                                        "2000-02-10", "2000-02-11", "2000-02-12")),
                 date_end = as.Date(c("2000-01-03", "2000-01-18", "2000-01-22",
                                      "2000-01-03", "2000-01-18", "2000-01-22",
                                      "2000-01-18", "2000-01-22",
                                      "2000-02-11", "2000-02-14", "2000-02-17")))

physician_spec <- data.frame(physician_id = c(1, 2, 2, 3),
                      specialty_code = c(100, 100, 200, 200)) %>%
  group_by(physician_id) %>%
  summarise(specialties = list(specialty_code))

df_with_date_range <- df %>%
  mutate(date_range1 = date_start - 31,
         date_range2 = date_start - 1) %>%
  as_tibble() %>%
  left_join(physician_spec, by = "physician_id") 

#Below uses specialty

df_with_date_range %>%
  mutate(hospital_id_in_range = pmap(list(date_range1, date_range2, hospital_id, physician_id, specialties),
                   function(x, y, z, p, s) filter(df_with_date_range,
                                                 date_start >= x & date_start <= y,
                                                 hospital_id == z,
                                                 physician_id != p,
                                                 any(specialties %in% s))$hospitalization_id)) %>%
  unnest(hospital_id_in_range, keep_empty = TRUE)

#Below does not use specialty

df_with_date_range %>%
  mutate(hospital_id_in_range = pmap(list(date_range1, date_range2, hospital_id, physician_id),
                   function(x, y, z, p) filter(df_with_date_range,
                                                 date_start >= x & date_start <= y,
                                                 hospital_id == z,
                                                 physician_id != p)$hospitalization_id)) %>%
  unnest(hospital_id_in_range, keep_empty = TRUE)

对于这个数据集，包含的专业版本和未包含的版本是相同的，所以您必须使用它，看看我是否犯了错误或者是这样。实际上，我只是加入了dataframes，然后在pmap中添加了另一个筛选条件。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73379338

复制

相似问题

问在行特定范围内查找具有不同级别限制的行集。
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在行特定范围内查找具有不同级别限制的行集。EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在行特定范围内查找具有不同级别限制的行集。
EN