文章/答案/技术大牛

发布

问在R中按条件选择行
EN

Stack Overflow用户

提问于 2020-03-26 06:11:51

回答 1查看 60关注 0票数 0

我试图从有几个条件的数据中得到一个值。我得到的数据是1个月内的1个文件，而我得到的数据不是连续的。数据如下所示

                                        measure       value
1                            Station identifier        WAML
2                                Station number       97072
3                              Observation time 150101/0000
...
27       Mean mixed layer potential temperature      298.68
28                Mean mixed layer mixing ratio       16.77
29                1000 hPa to 500 hPa thickness     5773.00
30  Precipitable water [mm] for entire sounding       55.86
31                           Station identifier        WAML
32                               Station number       97072
33                             Observation time 150109/1200
...
57       Mean mixed layer potential temperature      300.78
58                Mean mixed layer mixing ratio       16.29
59                1000 hPa to 500 hPa thickness     5784.00
60  Precipitable water [mm] for entire sounding       52.46
61                           Station identifier        WAML
62                               Station number       97072
63                             Observation time 150110/0000
...
87       Mean mixed layer potential temperature      297.48
88                Mean mixed layer mixing ratio       16.55
89                1000 hPa to 500 hPa thickness     5760.00
90                           Station identifier        WAML
91                               Station number       97072
92                             Observation time 150110/1200
...

数据和我希望通过“观测时间”和“整个测深的可降水量毫米”进行过滤，这样我就可以得到值。但也有一次观测的情况，没有降水资料，只有其他参数的观测时间。

我试着用：

df1 <-  dplyr::filter(obs.tpw, grepl(paste(c("Observation time", "Precipitable water [mm] for entire sounding"), collapse = "&"), paste(measure, value, sep = "_")))

但里面没有数据，

如何只得到观测时间和降水参数的值，然后按顺序排列。观测时间值为'data'/'time'，150101为(年)(月)(日)/(小时)(分钟)。我得到的数据没有按日期和时间进行排序。例如，第一次观测时间为150101/0000，第二次观测时间为150109/1200，第二次观测时间应为150101/1200，因为一天内观测量是观测值的2倍(0 0和1 200)。

我想要的最后数据如下：

                                       measure       value
1                             Observation time 150101/0000
2  Precipitable water [mm] for entire sounding       55.86
3                             Observation time 150101/1200
4  Precipitable water [mm] for entire sounding       52.46
5                             Observation time 150102/0000
6  Precipitable water [mm] for entire sounding       61.15
7                             Observation time 150102/1200
8  Precipitable water [mm] for entire sounding       55.93
9                             Observation time 150103/0000
10 Precipitable water [mm] for entire sounding       52.25
11                            Observation time 150103/1200
12 Precipitable water [mm] for entire sounding       61.48
13                            Observation time 150104/0000
14 Precipitable water [mm] for entire sounding          NA
15                            Observation time 150104/1200
16 Precipitable water [mm] for entire sounding       61.92
17                            Observation time 150105/0000
18 Precipitable water [mm] for entire sounding          NA
19                            Observation time 150105/1200
20 Precipitable water [mm] for entire sounding       57.42

filter

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-03-26 08:28:28

我已作出以下假设，但你在上述问题中并不清楚(如有需要，我会修改我的答覆，如果答案不正确)：

唯一的观测是由Station identifier、Station number和Observation time组合表示的。
每个观察都包含这三个标识符，它们总是在与该观测相关的数据之前以相同的顺序出现。
我对Observation time中使用的时间日期格式一无所知，但我猜想它类似于'date'/'time'，其中'date'是一个整数序列，它指的是某个参考日期之后的天数。

First，尝试在这些问题中包含一个可重复的数据集，或者链接到可公开的数据：

# Create Reproducible Dataset ---------------------------------------------
measure <- c("Station identifier", 
             "Station number", 
             "Observation time", "Mean mixed layer potential temperature", 
             "Mean mixed layer mixing ratio", "1000 hPa to 500 hPa thickness",
             "Precipitable water [mm] for entire sounding", "Station identifier", 
             "Station number", "Observation time", 
             "Mean mixed layer potential temperature",
             "Mean mixed layer mixing ratio", "1000 hPa to 500 hPa thickness", 
             "Precipitable water [mm] for entire sounding", "Station identifier", 
             "Station number", "Observation time", 
             "Mean mixed layer potential temperature", 
             "Mean mixed layer mixing ratio", 
             "1000 hPa to 500 hPa thickness", "Station identifier", 
             "Station number", "Observation time")
value <- c("WAML", "97072", "150101/0000", "298.68", "16.77", "5773.00", "55.86", 
           "WAML", "97072", "150109/1200", "300.78", "16.29", "5784.00", "52.46", 
           "WAML", "97072", "150110/0000", "297.48", "16.55", "5760.00", "WAML", 
           "97072", "150110/1200")
df <- data.frame(measure = measure, value = value, stringsAsFactors = FALSE)

现在请回答你的问题：

# Solution ----------------------------------------------------------------

# Create index of rows where `measure == "Station identifier"`
idx <- which(df$measure == "Station identifier")

df %>% 
    # Create Unique Identifier for each station
    dplyr::mutate(station_id = cut(1:nrow(df), 
                                   c(idx, nrow(df)),
                                   right = FALSE, 
                                   include.lowest = TRUE)) %>% 
    dplyr::filter(measure %in% c("Observation time", 
                                 "Precipitable water [mm] for entire sounding")) %>% 
    # Turn each value in measure to a new column
    tidyr::pivot_wider(names_from = "measure", values_from = "value", ) %>% 
    # Inelegant way of sorting by date and time
    dplyr::mutate(ot =  as.numeric(sub("\\/", ".", `Observation time`))) %>% 
    dplyr::arrange(ot) %>% 
    dplyr::select(-ot) %>% 
    tidyr::drop_na()

最后，我想指出，虽然您可能能够使用tidyverse品牌的软件包很好地解析和分析这些数据，但是如果您的研究领域需要经常使用地理空间、时空或大气数据，那么似乎已经有大量专门为此目的构建的R包集合。我在这方面绝对没有经验，但从我的简短搜索来看，CRAN上的时空包似乎很有希望，因为它可能能够以这种格式处理数据。另一个可能有用的资源是下面的佩贝斯玛入门。

我希望这是有用的。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60861841

复制

相似问题

问在R中按条件选择行
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在R中按条件选择行EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在R中按条件选择行
EN