我有以下数据。
head(graph_data, n = 15)
source target
1 Ohrid СКОПЈЕ
2 Ohrid СКОПЈЕ
3 Ohrid СКОПЈЕ
4 Ohrid СКОПЈЕ
5 Ohrid СКОПЈЕ
6 Ohrid СКОПЈЕ
7 Ohrid СКОПЈЕ
8 Ohrid СКОПЈЕ
9 Ohrid СКОПЈЕ
10 Ohrid СКОПЈЕ
11 Ohrid СКОПЈЕ
12 Ohrid СКОПЈЕ
13 Ohrid СКОПЈЕ
14 Ohrid СКОПЈЕ
15 Ohrid СКОПЈЕ我编写了下面的函数来自动过滤与源匹配的最高数量的匹配。
top_connections <- function(data, city, top_n) {
temp <- filter(data, source == city)
temp2 <- as.data.frame(table(temp$target))
temp2 <- arrange(temp2, desc(Freq))
temp2 <- temp2[1:top_n, ]
temp3 <- as.data.frame(unique(temp2$Var1))
colnames(temp3)[1] <- "top_connecitons"
#works fine until here
temp4 <- subset(temp, source %in% temp3[, "top_connecitons"])
return(temp4)
}我面临的唯一问题是将临时文件子集为仅在temp4中出现的值。
结果是一个零行的dataframe,而它应该包含到过滤城市的前15个目标连接。
函数调用:
test1 <- top_connections(graph_data, "Skopje", top_n = 15)知道我哪里搞错了吗?
更新:
链接到数据:fixed.xlsx
环境:
search()
[1] ".GlobalEnv" "package:networkD3"
[3] "package:data.table" "package:DT"
[5] "package:corrplot" "package:scales"
[7] "package:dplyr" "package:purrr"
[9] "package:readr" "package:tidyr"
[11] "package:tibble" "package:tidyverse"
[13] "package:ggthemes" "package:ggplot2"
[15] "package:readxl" "package:lubridate"
[17] "tools:rstudio" "package:stats"
[19] "package:graphics" "package:grDevices"
[21] "package:utils" "package:datasets"
[23] "package:methods" "Autoloads"
[25] "package:base" 发布于 2017-05-22 11:44:27
graph_data < data.frame(source=c("Paris","Berlin","Paris","London","Munich"),target=c("Amsterdam","Paris","Paris","Brighton","Paris"),stringsAsFactors = F)
top_connections <- function(data, city, top_n) {
temp <- dplyr::filter(data,source==city)
temp2 <- as.data.frame(table(temp$target))
temp2 <- dplyr::arrange(temp2, desc(Freq))
temp2 <- temp2[1:top_n, ]
temp3 <- as.data.frame(unique(temp2$Var1))
colnames(temp3)[1] <- "top_connecitons"
temp4 <- subset(temp, source %in% temp3[, "top_connecitons"])
return(temp4)
}试试看:
top_connections(graph_data,"Paris",2)
source target
1 Paris Amsterdam
2 Paris Parishttps://stackoverflow.com/questions/44111430
复制相似问题