首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >R-分割数据到水文小区

R-分割数据到水文小区
EN

Stack Overflow用户
提问于 2015-08-30 04:52:46
回答 1查看 160关注 0票数 0

我希望根据水文年的定义将我的数据集划分为年份季度。维基百科说,“由于气象和地理因素,水年的定义各不相同”。在美国,水文年是从一年的10月1日到次年的9月30日之间的一个时期。我使用了波兰水文年的定义(11月1日开始,10月31日结束)。

样本数据集看起来是折叠的:

代码语言:javascript
运行
复制
sampleData <- structure(list(date = structure(c(15946, 15947, 15875, 15910, 15869, 15888, 15823, 16059, 16068, 16067), class = "Date"),`example value` = c(-0.325806595888448, 0.116001346459147, 1.68884381116696, -0.480527505762716, -0.50307381813168,-1.12032214801472, -0.659699514672226, -0.547101497279717, 0.729148872679021,-0.769760735764215)), .Names = c("date", "example value"), row.names = c(NA, -10L), class = "data.frame")

由于某些原因,函数“剪切”在我的代码中抱怨“打断”和“标签”的长度不同(但它们没有)。如果我省略了剪切函数中的“标签”选项(如下所示),则效果非常好。标签有什么问题?

代码语言:javascript
运行
复制
ToHydroQuarters <-function(df)
{
  result <- df
  yearStart <- as.numeric(format(min(df$date),'%Y'))-1
  #Hydrological year in Poland starts at November 1st
  DateStart <- as.Date(paste(yearStart,"-11-01",sep=""))

  breaks <- seq(from=DateStart, to=max(df$date)+90, by="quarter")
  breakYear <- format(breaks,'%Y')

  #Please, do not create labels in such way.
  #Please note that for November and December we have next hydrological year - since it started at 1st November. So, we need to check month to decide which year we have (?) or use cut function again as mentioned here: http://stackoverflow.com/questions/22073881/hydrological-year-time-series
  labels <- c(paste("Winter",breakYear[1]),
           paste("Spring",breakYear[2]),
           paste("Summer",breakYear[3]),
           paste("Autumn",breakYear[4]),
           paste("Autumn",breakYear[5]))

  ######Here is problem - once I add labels parameter, function complains about different lengths
  result$hydroYear <- cut(df$date, breaks)

  result
}
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-08-30 06:06:24

首先,我认为在函数中使用标签作为“硬编码”变量是不明智的,因为没有某种可重复的例子是不可能检查的,但是我可以看到您试图实现的目标。

您声称您的中断和标签应该是正确的长度,但是函数本身并不总是工作的(这没有标签,即使标签确实存在,cut函数也没有处理日期的最后一部分)。

例如:

代码语言:javascript
运行
复制
library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="day")))
a <- ToHydroQuarters(df)

tail(a)

返回:

代码语言:javascript
运行
复制
          date hydroYear
971 2011-08-29      <NA>
972 2011-08-30      <NA>
973 2011-08-31      <NA>
974 2011-09-01      <NA>
975 2011-09-02      <NA>
976 2011-09-03      <NA>

做像breaks <- seq(from=DateStart, to=max(df$date)+90, by="quarter")这样的事情确实解决了这个问题,因为它迫使中断实际上存在。这可能解决您的标签问题,您在您的功能,但它没有使功能“通用”。

就个人而言,在编码方面,我认为更好的做法是分别转换月份和年份部分,因为这将更容易理解。例如,您可以使用library(lubridate)轻松提取月份,并像通常那样指定中断和标签。我在想这个函数应该是这样的:

代码语言:javascript
运行
复制
thq <- function(date) {
  mnth <- cut(month(date), breaks=c(1,4,7, 10, 12), 
              right=FALSE, include.lowest=TRUE, 
              labels=c("Spring", "Summer", "Autumn", "Winter"))
  return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}

所以用一些假数据..。

代码语言:javascript
运行
复制
library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="month")))

thq <- function(date) {
  mnth <- cut(month(date), breaks=c(1,4,7, 10, 12), 
              right=FALSE, include.lowest=TRUE, 
              labels=c("Spring", "Summer", "Autumn", "Winter"))
  return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}

df$newdate <- thq(df$date)

它的输出如下:

代码语言:javascript
运行
复制
         date     newdate
1  2009-01-01 Spring 2009
2  2009-02-01 Spring 2009
3  2009-03-01 Spring 2009
4  2009-04-01 Summer 2009
5  2009-05-01 Summer 2009
6  2009-06-01 Summer 2009
7  2009-07-01 Autumn 2009
8  2009-08-01 Autumn 2009
9  2009-09-01 Autumn 2009
10 2009-10-01 Winter 2010
11 2009-11-01 Winter 2010
12 2009-12-01 Winter 2010
13 2010-01-01 Spring 2010
14 2010-02-01 Spring 2010
15 2010-03-01 Spring 2010
16 2010-04-01 Summer 2010
17 2010-05-01 Summer 2010
18 2010-06-01 Summer 2010
19 2010-07-01 Autumn 2010
20 2010-08-01 Autumn 2010
21 2010-09-01 Autumn 2010
22 2010-10-01 Winter 2011
23 2010-11-01 Winter 2011
24 2010-12-01 Winter 2011
25 2011-01-01 Spring 2011
26 2011-02-01 Spring 2011
27 2011-03-01 Spring 2011
28 2011-04-01 Summer 2011
29 2011-05-01 Summer 2011
30 2011-06-01 Summer 2011
31 2011-07-01 Autumn 2011
32 2011-08-01 Autumn 2011
33 2011-09-01 Autumn 2011

你可以用模操作符换个月如果它的顺序很奇怪的话.

代码语言:javascript
运行
复制
thq <- function(date) {
mnth <- cut(((month(df$date)+1) %% 12), breaks=c(0, 3, 6, 9, 12), 
            right=FALSE, include.lowest=TRUE, 
            labels=c("Nov_Jan", "Feb_Apr", "May_Jul", "Aug_Oct")
            )
# you will need to alter the return statement yourself, because
# I feel there is enough information for you to do it, rather than
# me changing it every time you change the question.
return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}

library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="day")))

df$new <- thq(df$date)

head(df)

产出:

代码语言:javascript
运行
复制
> head(df)
        date          new
1 2009-01-01 Nov_Jan 2009
2 2009-01-02 Nov_Jan 2009
3 2009-01-03 Nov_Jan 2009
4 2009-01-04 Nov_Jan 2009
5 2009-01-05 Nov_Jan 2009
6 2009-01-06 Nov_Jan 2009
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/32293285

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档