我正在尝试提取一组间接的医生同事。我把在同一家医院工作的同事称为医生。间接同事是指与另一家医院的医生的同事一起工作的医生。在下面的例子中,医生"a“在医院1与"b”医生一起工作,后者在第2医院与"c“医生一起工作,因此"c”是"a“的间接同事。
当医生id构成字符串值(df0)或低数值(df1)时,下面的代码工作得很好,但当医生id构成高数值(df2)时,则不起作用。我想要修复代码,以便使用高数值(同时保留医生的原始ids )。
df0 <- tribble(
~hospital, ~doctors,
1, c("a", "b"),
2, c("b", "c"),
3, c("a", "d"),
) %>%
unnest(doctors)
# Below, I replaced doctor id with numeric values
df1 <- tribble(
~hospital, ~doctors,
1, c(1, 2),
2, c(2, 3),
3, c(1, 4),
) %>%
unnest(doctors)
# Now I added +5 to each physician id
df2 <- tribble(
~hospital, ~doctors,
1, c(6, 7),
2, c(7, 8),
3, c(6, 9)
) %>%
unnest(doctors)
df <- df2 # The code only works with df0 and df1, not with df2
colleagues <- full_join(df, df, by = c("hospital")) %>%
rename(doctor = doctors.x, colleagues = doctors.y) %>%
filter(doctor != colleagues) %>%
distinct(doctor, colleagues) %>%
chop(colleagues) %>%
deframe()
colleagues %>%
enframe(name = "ego",
value = "alter") %>%
unnest(alter) %>%
mutate(ego_colleagues = map(ego, ~ colleagues[[.x]]),
alter_colleagues = map(alter, ~ colleagues[[.x]]),
alter_colleague_only = map2(alter_colleagues, ego_colleagues, ~ .x[!(.x %in% .y)])) %>%
unnest(alter_colleague_only) %>%
filter(ego != alter_colleague_only) %>%
select(ego, alter, alter_colleague_only)
发布于 2022-06-23 02:14:39
问题在您的map
调用中。使用df2
时,当您使用map(ego, ~ colleagues[[.x]])
时,colleagues[.x]
是按位置而不是名称进行索引。使用字符名时,默认为使用字符名。当你使用数字名称,它们是1,2,3,4,它碰巧是幸运的。但是,当您有一个4的列表并调用colleagues[[6]]
时,就会得到超出界限的索引错误。如果这还不完全清楚,就打印这些:colleagues[[1]]
与colleagues[[6]]
与colleagues$`6`
。一个快速的解决方法是在as.character中包装这些map语句的第一部分,如下所示:
colleagues %>%
enframe(name = "ego",
value = "alter") %>%
unnest(alter) %>%
mutate(ego_colleagues = map(as.character(ego), ~ colleagues[[.x]]),
alter_colleagues = map(as.character(alter), ~ colleagues[[.x]]),
alter_colleague_only = map2(as.character(alter_colleagues), as.character(ego_colleagues), ~ .x[!(.x %in% .y)])) %>%
unnest(alter_colleague_only) %>%
filter(ego != alter_colleague_only) %>%
select(ego, alter, alter_colleague_only)
更新:根据您的设置,您可以尝试使用带有future_map
和future_map2
的future_map
包,但至少在这个最小的示例中,这是一种慢得多的方法。我不知道你的真实数据是否属实。
这是另一个选择。虽然它很难看,因为它有很多中间对象,但它可能会有所帮助。它使用矩阵,并利用了这样一个事实,即你有这些相互关系(如果我正确解释的话)。我给它做了标杆,花了一半的时间。
t1 <- colleagues %>%
enframe(name = "ego",
value = "alter") %>%
unnest(alter) %>%
filter(!duplicated(paste0(pmax(ego, alter), pmin(ego, alter)))) %>%
as.matrix()
t2 <- t1 %>%
rbind(t1[1:nrow(t1),c(2,1)])
alter_colleague_only <- t2[match(t2[,2], t2[,1]), "alter"]
t3 <- cbind(t2, alter_colleague_only)
t4 <- t3[which(t2[,1] != t3[,3]),]
t5 <- t4[,c(3,2,1)]
t6 <- rbind(t4, t5) %>%
as_tibble() %>%
arrange(ego)
https://stackoverflow.com/questions/72722617
复制相似问题