我在R中使用以下数据结构。
Dataframe<
Cust_Id DateTime Price Size Type Batch PI1 Status Source
TYY-132 2020-08-01 12:14:15 1500 35 RX1 Nov NA Done RDT_DF
TYY-231 2020-08-01 11:04:45 1000 55 Nav Dpc NA WIP RFF_DF
TYY-131 2020-08-02 10:18:25 1000 25 Nov Dpc NA Done RFF_DF
TYY-232 2020-08-02 12:14:34 1200 45 RX1 Nvv GO Done RFF_DF
TYY-112 2020-08-03 06:05:01 1300 54 RX1 Nov GO Open RYU_DR
TYY-442 2020-08-03 20:40:50 1500 15 RTR Nov NA Done RUI_DY
TYY-432 2020-08-03 17:13:12 1000 48 REE Nvv NA Done RFF_DF
TYY-235 2020-08-04 15:19:11 500 51 RX1 Nov NA Done RFF_DF我想通过Date将上面提到的数据访问组用于特定的类别.
类别-1:其中,Size是>=35和<=55,Type = RX1和Batch =11月,PI1的值必须是NA,Source不能包含与RDT_类似的内容。
哪里
Count_Order是指特定Date.Count_Done的Cust_Id总数( Status等于Status),Status除以Count_OrderCount_Category1是那些满足Category-1条件的微小日期的Cust_Id,如上面提到的满足Category-1条件的Status是满足Category-1条件的Cust_Id,而Status等于Done。H 236H 137<Count_Category1.除以/code>%Category1是Count_Done_Category1
所需产出:
Date Count_Order Count_Done Total_% Count_Category1 Count_Done_Category1 %Category1
2020-08-04 1 1 100.00% 1 1 100.00%
2020-08-03 3 2 66.66% 0 0 0.00%
2020-08-02 2 2 100.00% 0 0 0.00%
2020-08-01 2 2 100.00% 0 0 0.00%发布于 2021-01-18 08:15:08
下面是一种使用dplyr的方法:
library(dplyr)
df %>%
mutate(category1 = between(Size, 35, 55) & Type == 'RX1' &
Batch == 'Nov' & is.na(PI1) & !grepl('RDT_', Source)) %>%
group_by(Date = as.Date(DateTime)) %>%
summarise(Count_Order = n(),
Count_Done = sum(Status == 'Done'),
`Total_%` = Count_Done/Count_Order * 100,
Count_Category1 = sum(category1),
Count_Done_Category1 = sum(category1 & Status == 'Done'),
`%Category1` = Count_Done_Category1/Count_Category1 * 100) %>%
replace(is.na(.), 0)
# Date Count_Order Count_Done `Total_%` Count_Category1 Count_Done_Catego… `%Category1`
# <date> <int> <int> <dbl> <int> <int> <dbl>
#1 2020-08-01 2 1 50 0 0 0
#2 2020-08-02 2 2 100 0 0 0
#3 2020-08-03 3 2 66.7 0 0 0
#4 2020-08-04 1 1 100 1 1 100https://stackoverflow.com/questions/65770697
复制相似问题