我试图从有几个条件的数据中得到一个值。我得到的数据是1个月内的1个文件,而我得到的数据不是连续的。数据如下所示
measure value
1 Station identifier WAML
2 Station number 97072
3 Observation time 150101/0000
...
27 Mean mixed layer potential temperature 298.68
28 Mean mixed layer mixing ratio 16.77
29 1000 hPa to 500 hPa thickness 5773.00
30 Precipitable water [mm] for entire sounding 55.86
31 Station identifier WAML
32 Station number 97072
33 Observation time 150109/1200
...
57 Mean mixed layer potential temperature 300.78
58 Mean mixed layer mixing ratio 16.29
59 1000 hPa to 500 hPa thickness 5784.00
60 Precipitable water [mm] for entire sounding 52.46
61 Station identifier WAML
62 Station number 97072
63 Observation time 150110/0000
...
87 Mean mixed layer potential temperature 297.48
88 Mean mixed layer mixing ratio 16.55
89 1000 hPa to 500 hPa thickness 5760.00
90 Station identifier WAML
91 Station number 97072
92 Observation time 150110/1200
...数据和我希望通过“观测时间”和“整个测深的可降水量毫米”进行过滤,这样我就可以得到值。但也有一次观测的情况,没有降水资料,只有其他参数的观测时间。
我试着用:
df1 <- dplyr::filter(obs.tpw, grepl(paste(c("Observation time", "Precipitable water [mm] for entire sounding"), collapse = "&"), paste(measure, value, sep = "_")))但里面没有数据,
如何只得到观测时间和降水参数的值,然后按顺序排列。观测时间值为'data'/'time',150101为(年)(月)(日)/(小时)(分钟)。我得到的数据没有按日期和时间进行排序。例如,第一次观测时间为150101/0000,第二次观测时间为150109/1200,第二次观测时间应为150101/1200,因为一天内观测量是观测值的2倍(0 0和1 200)。
我想要的最后数据如下:
measure value
1 Observation time 150101/0000
2 Precipitable water [mm] for entire sounding 55.86
3 Observation time 150101/1200
4 Precipitable water [mm] for entire sounding 52.46
5 Observation time 150102/0000
6 Precipitable water [mm] for entire sounding 61.15
7 Observation time 150102/1200
8 Precipitable water [mm] for entire sounding 55.93
9 Observation time 150103/0000
10 Precipitable water [mm] for entire sounding 52.25
11 Observation time 150103/1200
12 Precipitable water [mm] for entire sounding 61.48
13 Observation time 150104/0000
14 Precipitable water [mm] for entire sounding NA
15 Observation time 150104/1200
16 Precipitable water [mm] for entire sounding 61.92
17 Observation time 150105/0000
18 Precipitable water [mm] for entire sounding NA
19 Observation time 150105/1200
20 Precipitable water [mm] for entire sounding 57.42发布于 2020-03-26 08:28:28
我已作出以下假设,但你在上述问题中并不清楚(如有需要,我会修改我的答覆,如果答案不正确):
Station identifier、Station number和Observation time组合表示的。Observation time中使用的时间日期格式一无所知,但我猜想它类似于'date'/'time',其中'date'是一个整数序列,它指的是某个参考日期之后的天数。First,尝试在这些问题中包含一个可重复的数据集,或者链接到可公开的数据:
# Create Reproducible Dataset ---------------------------------------------
measure <- c("Station identifier",
"Station number",
"Observation time", "Mean mixed layer potential temperature",
"Mean mixed layer mixing ratio", "1000 hPa to 500 hPa thickness",
"Precipitable water [mm] for entire sounding", "Station identifier",
"Station number", "Observation time",
"Mean mixed layer potential temperature",
"Mean mixed layer mixing ratio", "1000 hPa to 500 hPa thickness",
"Precipitable water [mm] for entire sounding", "Station identifier",
"Station number", "Observation time",
"Mean mixed layer potential temperature",
"Mean mixed layer mixing ratio",
"1000 hPa to 500 hPa thickness", "Station identifier",
"Station number", "Observation time")
value <- c("WAML", "97072", "150101/0000", "298.68", "16.77", "5773.00", "55.86",
"WAML", "97072", "150109/1200", "300.78", "16.29", "5784.00", "52.46",
"WAML", "97072", "150110/0000", "297.48", "16.55", "5760.00", "WAML",
"97072", "150110/1200")
df <- data.frame(measure = measure, value = value, stringsAsFactors = FALSE)现在请回答你的问题:
# Solution ----------------------------------------------------------------
# Create index of rows where `measure == "Station identifier"`
idx <- which(df$measure == "Station identifier")
df %>%
# Create Unique Identifier for each station
dplyr::mutate(station_id = cut(1:nrow(df),
c(idx, nrow(df)),
right = FALSE,
include.lowest = TRUE)) %>%
dplyr::filter(measure %in% c("Observation time",
"Precipitable water [mm] for entire sounding")) %>%
# Turn each value in measure to a new column
tidyr::pivot_wider(names_from = "measure", values_from = "value", ) %>%
# Inelegant way of sorting by date and time
dplyr::mutate(ot = as.numeric(sub("\\/", ".", `Observation time`))) %>%
dplyr::arrange(ot) %>%
dplyr::select(-ot) %>%
tidyr::drop_na()最后,我想指出,虽然您可能能够使用tidyverse品牌的软件包很好地解析和分析这些数据,但是如果您的研究领域需要经常使用地理空间、时空或大气数据,那么似乎已经有大量专门为此目的构建的R包集合。我在这方面绝对没有经验,但从我的简短搜索来看,CRAN上的时空包似乎很有希望,因为它可能能够以这种格式处理数据。另一个可能有用的资源是下面的佩贝斯玛入门。
我希望这是有用的。
https://stackoverflow.com/questions/60861841
复制相似问题