我发现了Richard在R中计算网络范围函数的博客帖子 (Burt1981;Reagans和McEvily 2003)。该函数根据每个网络节点的接触数和这些节点的互连性为其分配值。网络范围可以通过节点的子组(例如,女性和男性节点)来计算。它们作为顶点的属性存储。
作者的例子非常具有说明性,但它基于一个相对较小的网络(大约100个节点)。我有一个大约有200000个节点的网络,这意味着这个函数的性能不适合我的分析。
我给你们举一个例子,根据Erdos-Renyi模型,随机图的大小可以被操纵,以计时函数的性能。
我不熟悉优化R码,但我认为需要更有效地存储邻接矩阵(例如,稀疏矩阵)。到目前为止,我的努力没有成功地发挥工作作用。
rm(list=ls())
library(igraph)
library(statnet)
library(intergraph)
library(tictoc)
set.seed(42)
## Source: https://ramorel.github.io/network-range/
## Function to find network range for each node in a network
## Arguments:
## net = adjacency matrix, igraph graph, or network object
## attr = Vector of attributes associated with each node in net
## directed = boolean indicated if the network is directed or not
netrange <- function(net, attr, directed = TRUE){
require(reshape2)
if (class(net) == "igraph") {
net <- as_adjacency_matrix(net, sparse = F)
}
else {
if(class(net) == "network") {
net <- as.matrix.network(net)
}
else {
net <- as.matrix(net)
}
}
if(nrow(net) != length(attr)) {
stop("Number of nodes must match length of attributes vector")
}
else {
if (directed == TRUE){
ns <- colnames(net)
el <- melt(net, varnames=c("ego", "alter"), value.name = "weight")
df <- cbind(rownames(net), attr)
el$ego_grp <- df[match(el[,1], df[,1]), 2]
el$alter_grp <- df[match(el[,2], df[,1]), 2]
#FINDING p_k, the strength of ties within each group
# z_iq = sum of strength of ties from nodes in group _k_ to all other alters
# z_ij = sum of strength of ties from nodes in group _k_ to alters in group _k_
z_iq <- sapply(unique(attr), function(x) {
sum(el[which(el$ego_grp==x), "weight"])
})
z_ij <- sapply(unique(attr), function(x) {
sum(el[which(el$ego_grp==x & el$alter_grp==x), "weight"])
})
p_k <- z_ij / z_iq
p_k[is.na(p_k)] <- 0
#FINDING p_ik, the strength of connection from person i to group k
# x_iq = sum of strength of ties for _i_ to alters in group _k_
# x_ij = sum of strength of ties for _i_ to all alters
x_ij <- sapply(colnames(net), function(x) {
sum(el[which(el$ego==x), "weight"])
}
)
x_iq <- list(NULL)
for(i in colnames(net)) {
x_iq[[i]] <- sapply(unique(attr), function(x) {
sum(el[which(el$ego==i & el$alter_grp==x), "weight"])
}
)
}
x_iq <- x_iq[-c(1)] #x_iq is now a list where each elements is a vector of node _i_ summed strength of tie to group _k_
p_ik <- lapply(1:length(x_iq),
function(x) x_iq[[x]] / x_ij[x])
# FINDING nd_i, the network diversity score for node _i_
nd_i <- sapply(1:length(p_ik),
function(x) 1 - sum(p_k*p_ik[[x]]^2, na.rm = F)
)
}
else {
ns <- colnames(net)
el <- melt(net, varnames=c("ego", "alter"), value.name = "weight")
dup <- data.frame(t(apply(el[,1:2],1,sort)))
el <- el[!duplicated(dup),]
df <- cbind(rownames(net), attr)
el$ego_grp <- df[match(el[,1], df[,1]), 2]
el$alter_grp <- df[match(el[,2], df[,1]), 2]
#FINDING p_k, the strength of ties within each group
# z_iq = sum of strength of ties from nodes in group _k_ to all other alters
# z_ij = sum of strength of ties from nodes in group _k_ to alters in group _k_
z_iq <- sapply(unique(attr), function(x) {
sum(el[which(el$ego_grp==x | el$alter_grp==x), "weight"])
})
z_ij <- sapply(unique(attr), function(x) {
sum(el[which(el$ego_grp==x & el$alter_grp==x), "weight"])
})
p_k <- z_ij / z_iq
p_k[is.na(p_k)] <- 0
#FINDING p_ik, the strength of connection from person i to group k
# x_iq = sum of strength of ties for _i_ to alters in group _k_
# x_ij = sum of strength of ties for _i_ to all alters
x_ij <- sapply(colnames(net), function(x) {
sum(el[which(el$ego==x | el$alter==x), "weight"])
}
)
x_iq <- list(NULL)
for(i in colnames(net)) {
x_iq[[i]] <- sapply(unique(attr), function(x) {
sum(el[which(el$ego==i & el$alter_grp==x), "weight"],
el[which(el$alter==i & el$ego_grp==x), "weight"])
}
)
}
x_iq <- x_iq[-c(1)] #x_iq is now a list where each elements is a vector of node _i_ summed strength of tie to group _k_
p_ik <- lapply(1:length(x_iq),
function(x) x_iq[[x]] / x_ij[x])
# FINDING nd_i, the network diversity score for node _i_
nd_i <- sapply(1:length(p_ik),
function(x) 1 - sum(p_k*p_ik[[x]]^2, na.rm = F)
)
}
return(nd_i)
}
}
# Generate exemplary network
g <- igraph::erdos.renyi.game(1000, 150, type = "gnm")
## Add categorical (binary) vertex feature: female
V(g)$female <- sample(c(0,1), replace=TRUE, size=length(V(g)))
V(g)$female
## transform igraph to statnet
net <- intergraph::asNetwork(g)
## Apply network function
tic()
range_female <- netrange(net,
net %v% "female",
directed = T)
seq_time <- toc()
发布于 2022-02-08 12:10:44
最好的优化是使用profvis
分析,它显示了你的瓶颈在哪里。不过,我也参加了你的工作。
在那里,我们看到x_ij <- sapply(colnames(net), function(x) { sum(el[which(el$ego==x | el$alter==x), "weight"]) })
花了很长时间。
我更喜欢data.table
自己,所以在这里提供代码来加速这个部分,它只需几秒钟。然后尝试同样的方法来创建x_iq。
xx_ij <- el[, .(xxij = lapply(.SD, sum)), by = c("ego"), .SDcols = c("weight")]
xx_ij2 <- xx_ij$xxij
names(xx_ij2) <- xx_ij$ego
identical(x_ij, unlist(xx_ij2))
# TRUE
**profiling
library(profvis)
p <- profvis({
## code here
})
f <- paste0("profile_", as.Date(now()), ".html")
htmlwidgets::saveWidget(p, f)
browseURL(f)
https://stackoverflow.com/questions/71030187
复制相似问题