文章/答案/技术大牛

发布

社区首页 >问答首页 >确定最近的站点并从该位置选择另一个变量

问确定最近的站点并从该位置选择另一个变量
EN

Stack Overflow用户

提问于 2019-07-17 01:33:40

回答 1查看 88关注 0票数 0

我有研究网站，我在那里收集数据，附近的气象站有关于温度和降水的信息。我想将我在研究地点的每日数据与最近气象站的天气信息配对。我认为，要做到这一点，我需要一个两步的过程，首先选择离研究地点最近的气象站，然后用天气数据创建一个新变量。

以下是我的数据的快照：

# study sites
site <- rep(LETTERS[1:3], 5)
siteLat <- rep(c(41, 42, 44), 5)
siteLon <- rep(c(68, 62, 63), 5)
siteDate <- rep(1:5, 3)
dfSites <- data.frame(cbind(site, siteLat, siteLon, siteDate))

# weather stations
station <- rep(letters[1:3], 5)
stationLat <- rep(c(40, 43, 45), 5)
stationLon <- rep(c(67, 61, 64), 5)
stationDate <- rep(1:5, 3)
temp <- sample(10:20, 15, replace=TRUE)
dfStation <- data.frame(cbind(station, stationLat, stationLon, stationDate, temp))

我试图使用这条线来确定哪个车站是最近的，但我只得到了一行距离。

distVincentyEllipsoid(df2[c("recvLon", "recvLat")], weather[c("lon", "lat")])

一旦计算出所有的距离，我对下一步的步骤有点不确定，但我认为我需要一些东西来选择最近的站点和比赛日期。这是我想出的最好的：

dfSites %>% 
    mutate(closestStation = ???,
           temp1 = temp[station == closestStation & stationDate == siteDate])

最终结果是我的研究站点dataframe，其中包含一个来自最近气象站的额外温度列。

dplyr

geosphere

回答 1

Stack Overflow用户

发布于 2019-07-17 01:51:19

我认为distVincentyEllipsoid(p1, p2, ...)试图找出p1的第一个点与p2的第一个点，p1的第二个点与p2的第二个点之间的距离，等等。你需要的是*"first in p1 all of p2，second in p1 with all of p2，等等)的扩展。

调整您的代码以调用dfSites和dfStation (而不是df2/weather)，以下方法应该适用于您。(我将使用dfStation[-1,...]删除其中一个站点，只是为了清楚地识别哪个维度代表站点和站点。

alldists <- sapply(seq_len(nrow(dfSites)), function(i) {
  distVincentyEllipsoid(dfSites[i,c("siteLon","siteLat")],
                        dfStation[-1,c("stationLon","stationLat")])
})
alldists
#           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
#  [1,] 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1
#  [2,] 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4
#  [3,] 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7
#  [4,] 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1
#  [5,] 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4
#  [6,] 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7
#  [7,] 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1
#  [8,] 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4
#  [9,] 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7
# [10,] 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1
# [11,] 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4
# [12,] 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7
# [13,] 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1
# [14,] 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4
#           [,9]    [,10]    [,11]    [,12]    [,13]    [,14]    [,15]
#  [1,] 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0
#  [2,] 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2
#  [3,] 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5
#  [4,] 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0
#  [5,] 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2
#  [6,] 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5
#  [7,] 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0
#  [8,] 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2
#  [9,] 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5
# [10,] 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0
# [11,] 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2
# [12,] 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5
# [13,] 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0
# [14,] 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2

(因为我们有14行，所以每行都是您的一个工作站。您不应该执行[-1,]索引，只需要知道哪一行/哪一列。)由此，我们知道站点A和站点b之间的差异是481351.6米(第一列，第二行)。

从这里，只需找到列-minimum：

apply(alldists, 2, which.min)
#  [1] 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2

建议距离站点A最近的站点是b (which.min将返回第一个最小值，它不表示平局)。

现在，dfStation[apply(alldists, 2, which.min),]为您提供了15行站点数据，这些数据可以轻松地进行cbind编辑或以其他方式与dfSites组合。

dplyr选项：

dfSites %>%
  mutate(
    station_i = purrr::map2_int(
      siteLat, siteLon,
      ~ which.min(geosphere::distVincentyEllipsoid(
          cbind(.x,.y), dfStation[-1,c("stationLon","stationLat")]))
      ),
    station = as.character(dfStation$station)[ station_i ]
  )
#    site siteLat siteLon siteDate station_i station
# 1     A      41      68        1         3       c
# 2     B      42      62        2         1       a
# 3     C      44      63        3         2       b
# 4     A      41      68        4         3       c
# 5     B      42      62        5         1       a
# 6     C      44      63        1         2       b
# 7     A      41      68        2         3       c
# 8     B      42      62        3         1       a
# 9     C      44      63        4         2       b
# 10    A      41      68        5         3       c
# 11    B      42      62        1         1       a
# 12    C      44      63        2         2       b
# 13    A      41      68        3         3       c
# 14    B      42      62        4         1       a
# 15    C      44      63        5         2       b

通过对它们做一个外积，可以看到速度略有(10-15%)的提高。

outer(seq_len(nrow(dfSites)), seq_len(nrow(dfStation)),
      function(i,j) geosphere::distVincentyEllipsoid(dfSites[i,2:3], dfStation[j,2:3]))

它还返回一个mxn矩阵(站点行)，然后apply(...)遍历该矩阵以获得最接近的索引。(我希望获得更大的性能提升，因为distVincentyEllipsoid只被调用一次……)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57062683

复制

相似问题

问确定最近的站点并从该位置选择另一个变量
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问确定最近的站点并从该位置选择另一个变量EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问确定最近的站点并从该位置选择另一个变量
EN