我有一张数据文件,看起来像:
DF1
ID Value Type Date
II-1 150 Active 2019-01-01 15:34:18
II-1 175 Active 2019-01-01 15:34:18
II-1 165 Active 2019-01-01 15:34:18
II-1 168 Active 2019-01-01 15:34:18
II-2 200 InActive 2019-01-05 17:14:20
II-2 45 InActive 2019-01-05 17:14:20
II-3 34 InActive 2019-02-04 11:04:12
II-4 78 InActive 2019-02-01 12:33:14
为了实现上述格式,我需要在R中以下面提到的格式转换上面的输出。
其中,1-3
和3-5
是ID基计数在其所属桶上的分支,这与Type
是一致的。
示例:如果ID II-1
出现四次,那么它将落入3-5
桶中,同样ID II-2
将落入1-3
桶中。
Month Total Active 1-3 3-5 InActive 1-3 3-5 Hold 1-3 3-5
Jan-19 6 2 1 1 0 0 0 0 0 0
Feb-19 2 0 0 0 2 2 0 0 0 0
发布于 2019-09-17 05:56:54
如果您也想要总数,您可以单独找到它们并执行连接:
## Libraries
library(tidyverse)
library(lubridate)
## Alter the DF1 table to get months in the right format: DF1_new
DF1_new <- DF1 %>%
# Create new month column
mutate(Month = as_factor(str_c(month(Date, label = TRUE), year(Date), sep = "-")),
Type = as_factor(Type)) %>%
# Reorder columns
select(Month, everything())
## Group DF1_new by Month and Type: right
right <- DF1_new %>%
# Count ID by month and type
count(Month, Type, ID) %>%
# Place each in buckets by count
mutate(Bucket = case_when(n < 4 ~ "1-3", TRUE ~ "4-5")) %>%
# Combine bucket names
unite(Type.Bucket, c(Type, Bucket), sep = ".") %>%
# Count how many IDs fall in each bucket type
count(Month, Type.Bucket) %>%
spread(Type.Bucket, n)
## Get month totals and join to month/type data frame
DF1_new %>%
# Count ID by month
group_by(Month) %>%
summarise(Total = n()) %>%
left_join(right, by = "Month")
结果:
Month Total Active.4-5 InActive.1-3
Jan-2019 6 1 1
Feb-2019 2 NA 2
发布于 2019-09-17 04:32:02
下面是一种粗略的方法,可以对其进行调整,以获得所需的特定列。
library(lubridate); library(tidyverse)
DF1 %>%
count(Month = floor_date(Date, "month"), Type, ID) %>%
mutate(bucket = case_when(n < 4 ~ "1-3", TRUE ~ "4-5")) %>%
count(Month, Type, bucket) %>%
unite(column, Type:bucket) %>%
spread(column, n, fill = 0)
## A tibble: 2 x 3
# Month `Active_4-5` `InActive_1-3`
# <dttm> <dbl> <dbl>
#1 2019-01-01 00:00:00 1 1
#2 2019-02-01 00:00:00 0 2
数据:
DF1 <- structure(list(ID = c("II-1", "II-1", "II-1", "II-1", "II-2",
"II-2", "II-3", "II-4"), Value = c(150L, 175L, 165L, 168L, 200L,
45L, 34L, 78L), Type = c("Active", "Active", "Active", "Active",
"InActive", "InActive", "InActive", "InActive"), Date = structure(c(1546385658,
1546385658, 1546385658, 1546385658, 1546737260, 1546737260, 1549307052,
1549053194), class = c("POSIXct", "POSIXt"), tzone = "")), row.names = c(NA,
-8L), class = "data.frame")
https://stackoverflow.com/questions/57967069
复制相似问题