我正在寻找一种在地图(纬度/经度)上对点(43429)进行分组的算法:
latitude longitude expenses
603680.0 2270029.0 272.0
618559.0 2219632.0 385.0 . . .但是,所有集群必须具有相同的第三个特性(费用)的总和。
像kmeans这样的算法不会使聚类具有相同的“权重”。
你知道做这件事的算法吗?我过去经常使用python或R
谢谢
发布于 2020-03-11 01:42:29
这不是一个完美的解决方案,只是蛮力。您所问的不是一个简单的计算问题(您可以阅读:multi problem)。
下面是我的强力R解决方案,假设你的数据是一个数据帧:
k <- 3 # define how many clusters you want
#A really simple gleedy clustering algorithm, basically you start each list with an elemnt and add the next element to the lowest scoring list
clustering <- function(df,k){
clusters <- list()
for (r in 1:k) {
clusters[[r]] <- df[r,]
}
for (i in 4:nrow(df)){
a = data.frame(sum(clusters[[1]]$expenses))
for (j in 2:k) {
a = rbind(a,sum(clusters[[j]]$expenses))
}
minimo = which.min(a[,1])
clusters[[minimo]] <- rbind(clusters[[minimo]],df[i,])
}
return(clusters)
}
#calculate the difference between the lowest and highest list
distance <- function(){
A <- clustering(df,k)
for (k in 1:k) {
start <- c(start,sum(A[[k]]$expenses))
}
distance <- max(start) - min(start)
return(distance)
}
#repeat the process with a diferent starting point and save the clusters which has the lowest variance
max.distance = distance()
Clusters <- clustering(df,k)
for (i in 2:50) {
df <- slice(df, sample(1:n()))
g=distance()
if (max.distance>g) {
max.distance <- distance()
Clusters <- clustering(df,k)
}
}https://stackoverflow.com/questions/60618667
复制相似问题