文章/答案/技术大牛

发布

社区首页 >问答首页 >使用long和lat按到不同数据帧中数据点的距离对数据帧进行分组

问使用long和lat按到不同数据帧中数据点的距离对数据帧进行分组
EN

Stack Overflow用户

提问于 2021-11-03 09:11:18

回答 1查看 37关注 0票数 0

我有两个DataFrames。一个包含几个发电厂以及它们各自按经度和纬度的位置，每一列都在一列中。另一个数据帧包含多个变电站，也具有long和lat。我想做的是将发电厂分配给离我最近的变电站。

df1 = pd.DataFrame{'ID_pp':['p1','p2','p3','p4'],'x':[12.644881,11.563269, 12.644881,  8.153184], 'y':[48.099206, 48.020081, 48.099206, 49.153766]}
df2 = pd.DataFrame{'ID_ss':['s1','s2','s3','s4'],'x':[9.269, 9.390, 9.317, 10.061], 'y':[55.037, 54.940, 54.716, 54.349]}

我想我需要计算所有点之间的距离，然后对数据帧进行分组，但我不确定如何进行。我找到了numpy.linalg.norm()函数，但它并不适合我。任何帮助都是非常感谢的。

我找到了这个解决方案，这基本上就是我需要的：

import pandas as pd
import geopy.distance



for i,row in test.iterrows(): # A
    df1 = row.x, row.y
    distances = []
    for j,row2 in df2.iterrows(): # B
        b = row2.x, row2.y
        distances.append(geopy.distance.geodesic(a, b).km)

    min_distance = min(distances)
    min_index = distances.index(min_distance)


    print("A", i, "is closest to B", min_index, min_distance, "km")

它是有效的，但它永远需要花费时间，而且我的数据集非常大。我认为使用.apply的方法可能会更快。有人知道如何将这种方法应用到应用方法中吗？

python

pandas

dataframe

numpy

回答 1

Stack Overflow用户

发布于 2021-11-03 20:58:59

这是一个使用geopandas的解决方案。对于更大的数据集，我不知道它的规模有多大。

import geopandas as gpd
import pandas as pd

df1 = pd.DataFrame({'ID_pp':['p1','p2','p3','p4'],'x':[12.644881,11.563269, 12.644881,  8.153184], 'y':[48.099206, 48.020081, 48.099206, 49.153766]})
df2 = pd.DataFrame({'ID_ss':['s1','s2','s3','s4'],'x':[9.269, 9.390, 9.317, 10.061], 'y':[55.037, 54.940, 54.716, 54.349]})

# create GeoDataFrames from the original dfs
gdf1 = gpd.GeoDataFrame(df1[['ID_pp']], geometry=gpd.points_from_xy(df1['x'], df1['y']), crs='EPSG:4326')
gdf2 = gpd.GeoDataFrame(df2[['ID_ss']], geometry=gpd.points_from_xy(df2['x'], df2['y']), crs='EPSG:4326')

# convert to another coordinate reference system for units in metres, EPSG:5243 suits Germany as far as I know 
gdf1 = gdf1.to_crs('EPSG:5243')
gdf2 = gdf2.to_crs('EPSG:5243')

gdf2 = gdf2.set_index('ID_ss')

def get_closest_ss(point, other):
    s = other.distance(point)
    return (s.idxmin(), s.min())

# find ID of closest substation to all power plants
gdf1[['closest_ss', 'distance']] = gdf1.geometry.apply(get_closest_ss, args=(gdf2,)).to_list()

# merge the dataframe with the power plants (gdf1) with the closest substation (gdf2)
gdf = gdf1.merge(gdf2, left_on='closest_ss', right_index=True, suffixes=('', '_ss'))

print(gdf)

# output

  ID_pp                         geometry closest_ss       distance  \
0    p1   POINT (159807.847 -320153.333)         s4  717896.945731   
1    p2    POINT (79356.344 -330713.037)         s4  711534.096071   
2    p3   POINT (159807.847 -320153.333)         s4  717896.945731   
3    p4  POINT (-171106.060 -202478.708)         s4  592470.679838   

                     geometry_ss  
0  POINT (-28563.516 372589.227)  
1  POINT (-28563.516 372589.227)  
2  POINT (-28563.516 372589.227)  
3  POINT (-28563.516 372589.227)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69822240

复制

相似问题

问使用long和lat按到不同数据帧中数据点的距离对数据帧进行分组
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用long和lat按到不同数据帧中数据点的距离对数据帧进行分组EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用long和lat按到不同数据帧中数据点的距离对数据帧进行分组
EN