首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >在行特定范围内查找具有不同级别限制的行集。

在行特定范围内查找具有不同级别限制的行集。
EN

Stack Overflow用户
提问于 2022-08-16 19:25:50
回答 2查看 89关注 0票数 0

我有一个数据集,每一行都是由住院医生身份识别的。每一行还载有关于入院和出院日期及发生地点的资料。住院可能涉及多名医生。医生可以在多家医院工作。

我有另一个数据集,其中包含每个医生的专业信息(例如,临床医生、心脏病专家)。一位医生可能有多种专业。

我想知道,每一位住院医生排在同一专科的其他内科医生在开始住院前30天完成的所有其他住院病人的身份。

在一定程度上使用了R (dplyr): find all rows in row-specific range *WITH RESTRICTION*中的解决方案,我成功地编写了一个代码,在开始住院前的30天内,在同一家医院的其他医生所做的所有住院治疗。对于每一行,我首先找到了在30天时间内在给定医院的所有住院病人的列表。然后,我找到了一份清单,其中只包括有自我医生参与的住院情况。最后,我在第一个列表中选择了不在第二个列表中的元素。

我想调整代码,以找到其他医生住院,谁至少有一个专业与自我医生。理想情况下,我想改变上面代码的第一步,以找到在自我医生的专业范围内的所有住院病人的列表。然后,我可以使用代码的其余部分,因为它是从这个列表中减去涉及自我医生的住院治疗。这里的主要困难是,一个医生可能有多个专业--否则,只需要在过滤器功能中包含另一个变量。

下面是我现在的代码-它没有考虑到自我医生的专业。

代码语言:javascript
运行
复制
df <- data.frame(hospitalization_id = c(1, 2, 3,
                                        1, 2, 3,
                                        4, 5, 
                                        6, 7, 8),
                 hospital_id = c("A", "A", "A", 
                                 "A", "A", "A", 
                                 "A", "A",
                                 "B", "B", "B"),
                 physician_id = c(1, 1, 1, 
                                  2, 2, 2,
                                  3, 3, 
                                  2, 2, 2),
                 date_start = as.Date(c("2000-01-01", "2000-01-12", "2000-01-20",
                                        "2000-01-01", "2000-01-12", "2000-01-20",
                                        "2000-01-12", "2000-01-20",
                                        "2000-02-10", "2000-02-11", "2000-02-12")),
                 date_end = as.Date(c("2000-01-03", "2000-01-18", "2000-01-22",
                                      "2000-01-03", "2000-01-18", "2000-01-22",
                                      "2000-01-18", "2000-01-22",
                                      "2000-02-11", "2000-02-14", "2000-02-17")))

df2 <- df %>%
  mutate(
    # Generates 30-day time interval before start of given hospitalization 
    date_range1 = date_start - 30,
    date_range2 = date_start - 1,
    # List of all hospitalizations in given hospital, in time interval
    hospid_all = pmap(list(date_range1, date_range2, hospital_id),
                      function(x, y, z) filter(df,
                                               date_end >= x & date_end <= y,
                                               hospital_id == z)$hospitalization_id),
    hospid_all = lapply(hospid_all, unique),
    # List of ego's hospitalizations in given hospital, in time interval
    hospid_ego = pmap(list(date_range1, date_range2, hospital_id, physician_id),
                      function(x, y, z, p) filter(df,
                                                  date_end >= x & date_end <= y,
                                                  hospital_id == z,
                                                  physician_id == p)$hospitalization_id),
    # List of peers' hospitalizations in given hospital, in time interval
    hospid_peer = future_map2(hospid_all, hospid_ego, ~ .x[!(.x %in% .y)])) %>%
  select(-starts_with('date_'), -hospid_all, -hospid_ego) %>% # only keep peers' list of hospitalization
  rename('ego'='physician_id')

df3 <- df2 %>%
  select(hospitalization_id, hospital_id, ego, hospid_peer) %>%
  unnest(hospid_peer, keep_empty = TRUE)

df4 <- df3 %>%
  left_join(select(df, hospitalization_id, physician_id), 
            by=c('hospid_peer'='hospitalization_id')) %>%
  rename(alter = physician_id)

每一位医生的专业在另一种df中被告知。在本例中,医生2与医生1和医生3共享专业,但医生1和医生3没有任何共同之处。

代码语言:javascript
运行
复制
physician_spec <- data.frame(physician_id = c(1, 2, 2, 3),
                      specialty_code = c(100, 100, 200, 200))
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-08-16 19:56:29

您可以创建两个辅助函数,other_mdsf。第一种方法获取一个医生id,并返回那些具有匹配的专业人员的医生id。第二项是取医院身份证、医生身份证和开始日期(即df中某一行的开始日期),并返回在前30天内结束的住院病人名单,这些病人在同一医院,由一名专科医生进行。

代码语言:javascript
运行
复制
other_mds <- function(pid) {
  physician_spec[
    physician_id!=pid & specialty_code %in% physician_spec[physician_id==pid, specialty_code],
    physician_id]
}

f <- function(hid, pid, s) {
  other_phys = other_mds(pid)
  exclude_hosps = df[physician_id == pid, unique(hospitalization_id)]
  df[hospital_id == hid & 
       physician_id %in% other_phys &
       s>date_end &
       (s-date_end)<30 &
       !hospitalization_id %in% exclude_hosps,
     paste0(hospitalization_id, collapse=",")]
}

现在,我们只对每一行应用函数f

代码语言:javascript
运行
复制
library(data.table)
setDT(df)
setDT(physician_spec)
df[, matches:=f(hospital_id, physician_id,date_start), 1:nrow(df)]

输出:

代码语言:javascript
运行
复制
    hospitalization_id hospital_id physician_id date_start   date_end matches
                 <num>      <char>        <num>     <Date>     <Date>  <char>
 1:                  1           A            1 2000-01-01 2000-01-03        
 2:                  2           A            1 2000-01-12 2000-01-18        
 3:                  3           A            1 2000-01-20 2000-01-22        
 4:                  1           A            2 2000-01-01 2000-01-03        
 5:                  2           A            2 2000-01-12 2000-01-18        
 6:                  3           A            2 2000-01-20 2000-01-22       4
 7:                  4           A            3 2000-01-12 2000-01-18       1
 8:                  5           A            3 2000-01-20 2000-01-22     1,2
 9:                  6           B            2 2000-02-10 2000-02-11        
10:                  7           B            2 2000-02-11 2000-02-14        
11:                  8           B            2 2000-02-12 2000-02-17        

更新-返回匹配向量,然后合并:

  1. f使其返回向量

代码语言:javascript
运行
复制
f <- function(hid, pid, s) {
  other_phys = other_mds(pid)
  exclude_hosps = df[physician_id == pid, unique(hospitalization_id)]
  df[hospital_id == hid & 
       physician_id %in% other_phys &
       s>date_end &
       (s-date_end)<30 &
       !hospitalization_id %in% exclude_hosps]$hospitalization_id
}

现在,当我们运行这个函数时,

  1. 通过hospitalization_idphysician_id实现了这一点,因此它返回了一个三列data.table (列是by列和名为match的新列。然后将其合并到原始df

上。

代码语言:javascript
运行
复制
df[, .(match = f(hospital_id, physician_id,date_start)), .(hospitalization_id, physician_id)][
  df, 
  on=.(hospitalization_id,physician_id)
]

输出:

代码语言:javascript
运行
复制
    hospitalization_id physician_id match hospital_id date_start   date_end
                 <num>        <num> <num>      <char>     <Date>     <Date>
 1:                  1            1    NA           A 2000-01-01 2000-01-03
 2:                  2            1    NA           A 2000-01-12 2000-01-18
 3:                  3            1    NA           A 2000-01-20 2000-01-22
 4:                  1            2    NA           A 2000-01-01 2000-01-03
 5:                  2            2    NA           A 2000-01-12 2000-01-18
 6:                  3            2     4           A 2000-01-20 2000-01-22
 7:                  4            3     1           A 2000-01-12 2000-01-18
 8:                  5            3     1           A 2000-01-20 2000-01-22
 9:                  5            3     2           A 2000-01-20 2000-01-22
10:                  6            2    NA           B 2000-02-10 2000-02-11
11:                  7            2    NA           B 2000-02-11 2000-02-14
12:                  8            2    NA           B 2000-02-12 2000-02-17
票数 1
EN

Stack Overflow用户

发布于 2022-08-16 19:47:19

以下是我到目前为止所掌握的内容,请尝试其他示例和其他数据集,以确定它是否是您要寻找的:

代码语言:javascript
运行
复制
df <- data.frame(hospitalization_id = c(1, 2, 3,
                                        1, 2, 3,
                                        4, 5, 
                                        6, 7, 8),
                 hospital_id = c("A", "A", "A", 
                                 "A", "A", "A", 
                                 "A", "A",
                                 "B", "B", "B"),
                 physician_id = c(1, 1, 1, 
                                  2, 2, 2,
                                  3, 3, 
                                  2, 2, 2),
                 date_start = as.Date(c("2000-01-01", "2000-01-12", "2000-01-20",
                                        "2000-01-01", "2000-01-12", "2000-01-20",
                                        "2000-01-12", "2000-01-20",
                                        "2000-02-10", "2000-02-11", "2000-02-12")),
                 date_end = as.Date(c("2000-01-03", "2000-01-18", "2000-01-22",
                                      "2000-01-03", "2000-01-18", "2000-01-22",
                                      "2000-01-18", "2000-01-22",
                                      "2000-02-11", "2000-02-14", "2000-02-17")))

physician_spec <- data.frame(physician_id = c(1, 2, 2, 3),
                      specialty_code = c(100, 100, 200, 200)) %>%
  group_by(physician_id) %>%
  summarise(specialties = list(specialty_code))

df_with_date_range <- df %>%
  mutate(date_range1 = date_start - 31,
         date_range2 = date_start - 1) %>%
  as_tibble() %>%
  left_join(physician_spec, by = "physician_id") 

#Below uses specialty

df_with_date_range %>%
  mutate(hospital_id_in_range = pmap(list(date_range1, date_range2, hospital_id, physician_id, specialties),
                   function(x, y, z, p, s) filter(df_with_date_range,
                                                 date_start >= x & date_start <= y,
                                                 hospital_id == z,
                                                 physician_id != p,
                                                 any(specialties %in% s))$hospitalization_id)) %>%
  unnest(hospital_id_in_range, keep_empty = TRUE)

#Below does not use specialty

df_with_date_range %>%
  mutate(hospital_id_in_range = pmap(list(date_range1, date_range2, hospital_id, physician_id),
                   function(x, y, z, p) filter(df_with_date_range,
                                                 date_start >= x & date_start <= y,
                                                 hospital_id == z,
                                                 physician_id != p)$hospitalization_id)) %>%
  unnest(hospital_id_in_range, keep_empty = TRUE)

对于这个数据集,包含的专业版本和未包含的版本是相同的,所以您必须使用它,看看我是否犯了错误或者是这样。实际上,我只是加入了dataframes,然后在pmap中添加了另一个筛选条件。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73379338

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档