首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >在跨多行的单个列中查找最大日期

在跨多行的单个列中查找最大日期
EN

Stack Overflow用户
提问于 2012-08-26 10:16:30
回答 5查看 40.4K关注 0票数 13

我有以下数据框:

代码语言:javascript
复制
id       <- c(1,1,2,3,3)
date     <- c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df       <- data.frame(id,date)
df$date2 <- as.Date(as.character(df$date), format = "%d-%m-%y")


id     date      date2
1   23-01-08 2008-01-23
1   01-11-07 2007-11-01
2   30-11-07 2007-11-30
3   17-12-07 2007-12-17
3   12-12-08 2008-12-12

现在我需要创建第四列,并在其中为每个id插入事务的最大日期。最终表格应如下所示:

代码语言:javascript
复制
id     date      date2        max
1   23-01-08 2008-01-23 2008-01-23
1   01-11-07 2007-11-01   0
2   30-11-07 2007-11-30 2007-11-30 
3   17-12-07 2007-12-17   0
3   12-12-08 2008-12-12 2008-12-12

如果你能帮我这个忙,我将不胜感激。

EN

回答 5

Stack Overflow用户

回答已采纳

发布于 2012-08-26 10:23:37

代码语言:javascript
复制
id<-c(1,1,2,3,3)
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df<-data.frame(id,date)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")
# aggregate can be used for this type of thing
d = aggregate(df$date2,by=list(df$id),max)
# And merge the result of aggregate 
# with the original data frame
df2 = merge(df,d,by.x=1,by.y=1)
df2

  id     date      date2          x
1  1 23-01-08 2008-01-23 2008-01-23
2  1 01-11-07 2007-11-01 2008-01-23
3  2 30-11-07 2007-11-30 2007-11-30
4  3 17-12-07 2007-12-17 2008-12-12
5  3 12-12-08 2008-12-12 2008-12-12

编辑:当日期与最大日期不匹配时,您希望最后一列为“空”,因此可以尝试下一行。

代码语言:javascript
复制
df2[df2[,3]!=df2[,4],4]=NA

df2
  id     date      date2          x
1  1 23-01-08 2008-01-23 2008-01-23
2  1 01-11-07 2007-11-01       <NA>
3  2 30-11-07 2007-11-30 2007-11-30
4  3 17-12-07 2007-12-17       <NA>
5  3 12-12-08 2008-12-12 2008-12-12

当然,清理colname等总是很好的,但我把它留给你了。

票数 21
EN

Stack Overflow用户

发布于 2012-08-26 14:37:31

另一种方法是使用plyr包:

代码语言:javascript
复制
library(plyr)
ddply(df, "id", summarize, max = max(date2))

#  id        max
#1  1 2008-01-23
#2  2 2007-11-30
#3  3 2008-12-12

现在,这不是您想要的格式,因为它只显示每个id一次。不用担心,我们可以用transform代替summarize

代码语言:javascript
复制
ddply(df, "id", transform, max = max(date2))

#  id     date      date2        max
#1  1 01-11-07 2007-11-01 2008-01-23
#2  1 23-01-08 2008-01-23 2008-01-23
#3  2 30-11-07 2007-11-30 2007-11-30
#4  3 12-12-08 2008-12-12 2008-12-12
#5  3 17-12-07 2007-12-17 2008-12-12

正如@seandavi的答案所示,这将为每个id重复max日期。如果您想要将副本更改为NA,则可以执行以下操作:

代码语言:javascript
复制
within(ddply(df, "id", transform, max = max(date2)), max[max != date2] <- NA)
票数 11
EN

Stack Overflow用户

发布于 2012-08-26 11:18:11

代码语言:javascript
复制
library(sqldf)
tables<- '(SELECT * FROM df
           )
           AS t1,
           (SELECT id,max(date2) date2 FROM df GROUP BY id
           )
           AS t2'

out<-fn$sqldf("SELECT t1.*,t2.date2 mdate FROM $tables WHERE  t1.id=t2.id")
out$mdate<-as.Date(out$mdate)
out$mdate[out$date2!=out$mdate]<-NA
#  id     date      date2      mdate
#1  1 01-11-07 2007-11-01       <NA>
#2  1 23-01-08 2008-01-23 2008-01-23
#3  2 30-11-07 2007-11-30 2007-11-30
#4  3 12-12-08 2008-12-12 2008-12-12
#5  3 17-12-07 2007-12-17       <NA>
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/12127149

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档