我想用R中的dplyr
来解决以下问题。这个问题已经在这里用data.table
回答了:Finding indirect nodes for every edge (in R),但是因为我的代码的其余部分使用了dplyr,所以我需要调整它。
我有关于在特定医院一起工作的医生群体的信息。一名医生可以同时在多家医院工作。我想写一个代码,输出在给定医院工作的给定医生的所有间接同事的信息。例如,如果我在一家给定的医院与另一位也在另一家医院工作的医生一起工作,我想知道我的同事在另一家医院与谁一起工作。
考虑三个医院(1,2,3)和五个医生(A,B,C,D,E)的简单示例。医生A、B和C在医院一起工作1.医生A、B和D在医院一起工作2.医生B和E在医院一起工作3.
对于在特定医院工作的每个医生,我希望通过他们的每个直接同事了解他们的间接同事的信息。例如,A医生在1号医院有一个通过B医生的间接同事:这是3号医院的E医生。另一方面,B医生在1号医院没有通过A医生的间接同事。C医生有两个通过B医生在1号医院的间接同事:他们是2号医院的D医生和3号医院的E医生。以此类推。
下面是描述所有医院医生网络的对象:
edges <- tibble(hosp = c("1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "3", "3"),
from = c("A", "A", "B", "B", "C", "C", "A", "A", "B", "B", "D", "D", "B", "E"),
to = c("C", "B", "C", "A", "B", "A", "D", "B", "A", "D", "A", "B", "E", "B")) %>% arrange(hosp, from, to)
我想要一个能产生以下输出的代码:
output <- tibble(hosp = c("1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3"),
from = c("A", "A", "B", "B", "C", "C", "C", "A", "A", "B", "B", "D", "D", "D", "B", "E", "E", "E", "E"),
to = c("C", "B", "C", "A", "B", "A", "B", "D", "B", "A", "D", "A", "B", "B", "E", "B", "B", "B", "B"),
hosp_ind = c("" , "3", "" , "" , "2", "2", "3", "" , "3", "" , "" , "1", "1", "3", "" , "1", "1", "2", "2"),
to_ind = c("" , "E", "" , "" , "D", "D", "E", "" , "E", "" , "" , "C", "C", "E", "" , "A", "C", "A", "D")) %>% arrange(hosp, from, to)
发布于 2021-05-23 05:39:40
实际上,您可以通过以下方式将data.table
转换为dplyr
g <- simplify(graph_from_data_frame(edges, directed = FALSE))
edges %>%
rowwise() %>%
do(cbind(., {
to_ind <- setdiff(
do.call(
setdiff,
Map(names, ego(g, 2, c(.$to, .$from), mindist = 2))
), .$from
)
if (!length(to_ind)) {
hosp_ind <- to_ind <- NA_character_
} else {
hosp_ind <- lapply(to_ind, function(v) names(neighbors(g, v)))
}
data.frame(
hosp_ind = unlist(hosp_ind),
to_ind = rep(to_ind, lengths(hosp_ind))
)
}))
这给了你
# A tibble: 19 x 5
hosp from to hosp_ind to_ind
<chr> <chr> <chr> <chr> <chr>
1 1 A B 3 E
2 1 A C NA NA
3 1 B A NA NA
4 1 B C NA NA
5 1 C A 2 D
6 1 C B 2 D
7 1 C B 3 E
8 2 A B 3 E
9 2 A D NA NA
10 2 B A NA NA
11 2 B D NA NA
12 2 D A 1 C
13 2 D B 1 C
14 2 D B 3 E
15 3 B E NA NA
16 3 E B 1 A
17 3 E B 2 A
18 3 E B 1 C
19 3 E B 2 D
发布于 2021-05-22 08:22:47
由于您似乎只需要网络中的第一层间接连接,因此没有图形数据结构就非常简单。
get_indirects <- function(hosp_from) {
x=hosp_from$from[1]
hosp=hosp_from$hosp[1]
directs <- edges %>%
filter(from==x) %>%
pull(to)
indirects <- edges %>%
filter(from %in% directs & !(to %in% append(directs,x))) %>%
rename(to = from, hosp_ind = hosp, to_ind = to) %>%
select(to, hosp_ind, to_ind) %>%
mutate(hosp=hosp,from=x,.before=to)
}
split_edges <- edges %>%
group_by(hosp,from) %>%
group_split()
indirect_df <- lapply(split_edges, get_indirects) %>% bind_rows()
direct_df <- anti_join(edges, indirect_df[,c("from","to")], by = c("from","to"))
output <- bind_rows(indirect_df,direct_df) %>%
replace_na(list(hosp_ind="",to_ind="")) %>%
arrange(hosp,from,to)
这将产生与示例的预期输出相同的输出。
https://stackoverflow.com/questions/67643807
复制相似问题