我有一个数据集,如下所示:
city period_day date
1 barcelona morning 2017-01-15
2 sao_paulo afternoon 2016-12-07
3 sao_paulo morning 2016-11-16
4 barcelona morning 2016-11-06
5 barcelona afternoon 2016-12-31
6 sao_paulo afternoon 2016-11-30
7 barcelona morning 2016-10-15
8 barcelona afternoon 2016-11-30
9 sao_paulo afternoon 2016-12-24
10 sao_paulo afternoon 2017-02-02
对于每一行,我想要计算有多少行的日期比该行的日期更早,城市和period_day都是如此。在本例中,我希望得到以下结果:
city period_day date row_count
1 barcelona morning 2017-01-15 2
2 sao_paulo afternoon 2016-12-07 1
3 sao_paulo morning 2016-11-16 0
4 barcelona morning 2016-11-06 1
5 barcelona afternoon 2016-12-31 1
6 sao_paulo afternoon 2016-11-30 0
7 barcelona morning 2016-10-15 0
8 barcelona afternoon 2016-11-30 0
9 sao_paulo afternoon 2016-12-24 2
10 sao_paulo afternoon 2017-02-02 3
当row_count等于0时,这意味着它是较旧的日期。
我想出了一个解决方案,但由于数据太多,花了太长时间。这就是代码:
get_count_function <- function(df) {
idx <- 1:nrow(df)
count <- sapply(idx, function(x) {
name_city <-
df %>% select(city) %>% filter(row_number() == x) %>% pull()
name_period <-
df %>% select(period_day) %>% filter(row_number() == x) %>% pull()
date_row <- df %>%
select(date) %>%
filter(row_number() == x) %>%
pull()
date_any_row <- df %>%
filter(dplyr::row_number() != x,
city == name_city,
period_day == name_period) %>%
select(date) %>%
pull()
how_many <- sum(date_row > date_any_row)
return(how_many)
})
return(count)
}
我怎样才能使这个函数更有效率呢?
发布于 2019-05-10 03:52:55
如果您愿意使用data.table
包,这应该是可行的:
library(data.table)
dat <- read.table(header=T, row.names=1, text="
city period_day date
1 barcelona morning 2017-01-15
2 sao_paulo afternoon 2016-12-07
3 sao_paulo morning 2016-11-16
4 barcelona morning 2016-11-06
5 barcelona afternoon 2016-12-31
6 sao_paulo afternoon 2016-11-30
7 barcelona morning 2016-10-15
8 barcelona afternoon 2016-11-30
9 sao_paulo afternoon 2016-12-24
10 sao_paulo afternoon 2017-02-02
")
dat <- as.data.table(dat)
dat[, row_count := (order(as.Date(date)) - 1), by=.(city, period_day)]
# Check
dat
## city period_day date row_count
## 1: barcelona morning 2017-01-15 2
## 2: sao_paulo afternoon 2016-12-07 1
## 3: sao_paulo morning 2016-11-16 0
## 4: barcelona morning 2016-11-06 1
## 5: barcelona afternoon 2016-12-31 1
## 6: sao_paulo afternoon 2016-11-30 0
## 7: barcelona morning 2016-10-15 0
## 8: barcelona afternoon 2016-11-30 0
## 9: sao_paulo afternoon 2016-12-24 2
## 10: sao_paulo afternoon 2017-02-02 3
https://stackoverflow.com/questions/56065284
复制相似问题