我有一个数据集,每一行都是由住院医生身份识别的。每一行还载有关于入院和出院日期及发生地点的资料。住院可能涉及多名医生。医生可以在多家医院工作。
我有另一个数据集,其中包含每个医生的专业信息(例如,临床医生、心脏病专家)。一位医生可能有多种专业。
我想知道,每一位住院医生排在同一专科的其他内科医生在开始住院前30天完成的所有其他住院病人的身份。
在一定程度上使用了R (dplyr): find all rows in row-specific range *WITH RESTRICTION*中的解决方案,我成功地编写了一个代码,在开始住院前的30天内,在同一家医院的其他医生所做的所有住院治疗。对于每一行,我首先找到了在30天时间内在给定医院的所有住院病人的列表。然后,我找到了一份清单,其中只包括有自我医生参与的住院情况。最后,我在第一个列表中选择了不在第二个列表中的元素。
我想调整代码,以找到其他医生住院,谁至少有一个专业与自我医生。理想情况下,我想改变上面代码的第一步,以找到在自我医生的专业范围内的所有住院病人的列表。然后,我可以使用代码的其余部分,因为它是从这个列表中减去涉及自我医生的住院治疗。这里的主要困难是,一个医生可能有多个专业--否则,只需要在过滤器功能中包含另一个变量。
下面是我现在的代码-它没有考虑到自我医生的专业。
df <- data.frame(hospitalization_id = c(1, 2, 3,
1, 2, 3,
4, 5,
6, 7, 8),
hospital_id = c("A", "A", "A",
"A", "A", "A",
"A", "A",
"B", "B", "B"),
physician_id = c(1, 1, 1,
2, 2, 2,
3, 3,
2, 2, 2),
date_start = as.Date(c("2000-01-01", "2000-01-12", "2000-01-20",
"2000-01-01", "2000-01-12", "2000-01-20",
"2000-01-12", "2000-01-20",
"2000-02-10", "2000-02-11", "2000-02-12")),
date_end = as.Date(c("2000-01-03", "2000-01-18", "2000-01-22",
"2000-01-03", "2000-01-18", "2000-01-22",
"2000-01-18", "2000-01-22",
"2000-02-11", "2000-02-14", "2000-02-17")))
df2 <- df %>%
mutate(
# Generates 30-day time interval before start of given hospitalization
date_range1 = date_start - 30,
date_range2 = date_start - 1,
# List of all hospitalizations in given hospital, in time interval
hospid_all = pmap(list(date_range1, date_range2, hospital_id),
function(x, y, z) filter(df,
date_end >= x & date_end <= y,
hospital_id == z)$hospitalization_id),
hospid_all = lapply(hospid_all, unique),
# List of ego's hospitalizations in given hospital, in time interval
hospid_ego = pmap(list(date_range1, date_range2, hospital_id, physician_id),
function(x, y, z, p) filter(df,
date_end >= x & date_end <= y,
hospital_id == z,
physician_id == p)$hospitalization_id),
# List of peers' hospitalizations in given hospital, in time interval
hospid_peer = future_map2(hospid_all, hospid_ego, ~ .x[!(.x %in% .y)])) %>%
select(-starts_with('date_'), -hospid_all, -hospid_ego) %>% # only keep peers' list of hospitalization
rename('ego'='physician_id')
df3 <- df2 %>%
select(hospitalization_id, hospital_id, ego, hospid_peer) %>%
unnest(hospid_peer, keep_empty = TRUE)
df4 <- df3 %>%
left_join(select(df, hospitalization_id, physician_id),
by=c('hospid_peer'='hospitalization_id')) %>%
rename(alter = physician_id)
每一位医生的专业在另一种df中被告知。在本例中,医生2与医生1和医生3共享专业,但医生1和医生3没有任何共同之处。
physician_spec <- data.frame(physician_id = c(1, 2, 2, 3),
specialty_code = c(100, 100, 200, 200))
发布于 2022-08-16 19:56:29
您可以创建两个辅助函数,other_mds
和f
。第一种方法获取一个医生id,并返回那些具有匹配的专业人员的医生id。第二项是取医院身份证、医生身份证和开始日期(即df
中某一行的开始日期),并返回在前30天内结束的住院病人名单,这些病人在同一医院,由一名专科医生进行。
other_mds <- function(pid) {
physician_spec[
physician_id!=pid & specialty_code %in% physician_spec[physician_id==pid, specialty_code],
physician_id]
}
f <- function(hid, pid, s) {
other_phys = other_mds(pid)
exclude_hosps = df[physician_id == pid, unique(hospitalization_id)]
df[hospital_id == hid &
physician_id %in% other_phys &
s>date_end &
(s-date_end)<30 &
!hospitalization_id %in% exclude_hosps,
paste0(hospitalization_id, collapse=",")]
}
现在,我们只对每一行应用函数f
。
library(data.table)
setDT(df)
setDT(physician_spec)
df[, matches:=f(hospital_id, physician_id,date_start), 1:nrow(df)]
输出:
hospitalization_id hospital_id physician_id date_start date_end matches
<num> <char> <num> <Date> <Date> <char>
1: 1 A 1 2000-01-01 2000-01-03
2: 2 A 1 2000-01-12 2000-01-18
3: 3 A 1 2000-01-20 2000-01-22
4: 1 A 2 2000-01-01 2000-01-03
5: 2 A 2 2000-01-12 2000-01-18
6: 3 A 2 2000-01-20 2000-01-22 4
7: 4 A 3 2000-01-12 2000-01-18 1
8: 5 A 3 2000-01-20 2000-01-22 1,2
9: 6 B 2 2000-02-10 2000-02-11
10: 7 B 2 2000-02-11 2000-02-14
11: 8 B 2 2000-02-12 2000-02-17
更新-返回匹配向量,然后合并:
f
使其返回向量f <- function(hid, pid, s) {
other_phys = other_mds(pid)
exclude_hosps = df[physician_id == pid, unique(hospitalization_id)]
df[hospital_id == hid &
physician_id %in% other_phys &
s>date_end &
(s-date_end)<30 &
!hospitalization_id %in% exclude_hosps]$hospitalization_id
}
现在,当我们运行这个函数时,
hospitalization_id
和physician_id
实现了这一点,因此它返回了一个三列data.table (列是by列和名为match
的新列。然后将其合并到原始df上。
df[, .(match = f(hospital_id, physician_id,date_start)), .(hospitalization_id, physician_id)][
df,
on=.(hospitalization_id,physician_id)
]
输出:
hospitalization_id physician_id match hospital_id date_start date_end
<num> <num> <num> <char> <Date> <Date>
1: 1 1 NA A 2000-01-01 2000-01-03
2: 2 1 NA A 2000-01-12 2000-01-18
3: 3 1 NA A 2000-01-20 2000-01-22
4: 1 2 NA A 2000-01-01 2000-01-03
5: 2 2 NA A 2000-01-12 2000-01-18
6: 3 2 4 A 2000-01-20 2000-01-22
7: 4 3 1 A 2000-01-12 2000-01-18
8: 5 3 1 A 2000-01-20 2000-01-22
9: 5 3 2 A 2000-01-20 2000-01-22
10: 6 2 NA B 2000-02-10 2000-02-11
11: 7 2 NA B 2000-02-11 2000-02-14
12: 8 2 NA B 2000-02-12 2000-02-17
发布于 2022-08-16 19:47:19
以下是我到目前为止所掌握的内容,请尝试其他示例和其他数据集,以确定它是否是您要寻找的:
df <- data.frame(hospitalization_id = c(1, 2, 3,
1, 2, 3,
4, 5,
6, 7, 8),
hospital_id = c("A", "A", "A",
"A", "A", "A",
"A", "A",
"B", "B", "B"),
physician_id = c(1, 1, 1,
2, 2, 2,
3, 3,
2, 2, 2),
date_start = as.Date(c("2000-01-01", "2000-01-12", "2000-01-20",
"2000-01-01", "2000-01-12", "2000-01-20",
"2000-01-12", "2000-01-20",
"2000-02-10", "2000-02-11", "2000-02-12")),
date_end = as.Date(c("2000-01-03", "2000-01-18", "2000-01-22",
"2000-01-03", "2000-01-18", "2000-01-22",
"2000-01-18", "2000-01-22",
"2000-02-11", "2000-02-14", "2000-02-17")))
physician_spec <- data.frame(physician_id = c(1, 2, 2, 3),
specialty_code = c(100, 100, 200, 200)) %>%
group_by(physician_id) %>%
summarise(specialties = list(specialty_code))
df_with_date_range <- df %>%
mutate(date_range1 = date_start - 31,
date_range2 = date_start - 1) %>%
as_tibble() %>%
left_join(physician_spec, by = "physician_id")
#Below uses specialty
df_with_date_range %>%
mutate(hospital_id_in_range = pmap(list(date_range1, date_range2, hospital_id, physician_id, specialties),
function(x, y, z, p, s) filter(df_with_date_range,
date_start >= x & date_start <= y,
hospital_id == z,
physician_id != p,
any(specialties %in% s))$hospitalization_id)) %>%
unnest(hospital_id_in_range, keep_empty = TRUE)
#Below does not use specialty
df_with_date_range %>%
mutate(hospital_id_in_range = pmap(list(date_range1, date_range2, hospital_id, physician_id),
function(x, y, z, p) filter(df_with_date_range,
date_start >= x & date_start <= y,
hospital_id == z,
physician_id != p)$hospitalization_id)) %>%
unnest(hospital_id_in_range, keep_empty = TRUE)
对于这个数据集,包含的专业版本和未包含的版本是相同的,所以您必须使用它,看看我是否犯了错误或者是这样。实际上,我只是加入了dataframes,然后在pmap
中添加了另一个筛选条件。
https://stackoverflow.com/questions/73379338
复制相似问题