我有以下数据:
structure(list(Filters = structure(c(4L, 2L, 6L, 5L, 1L, 3L,
7L), .Label = c("product type : shade || shade family : white",
"space : bedroom || product type : appliances", "space : bedroom || shade family : white",
"space : living room || product type : shade || shade family : white",
"space : living room || product type : shade|appliances", "space : living room || shade family : white|black",
"space : living room || shade family : yellows & greens"), class = "factor"),
Count = c(143L, 131L, 119L, 101L, 94L, 84L, 82L)), class = "data.frame", row.names = c(NA,
-7L))
我想要得到数据中每个类别和子类别的总数。此外,如果类别有多个子类别,则每个类别都应该有每个类别的计数和。
例如,要获得sum of (产品类型:家电) =>产品类型:家电&产品类型:shade|家电两者都符合条件。
我正在寻找类似如下的输出:
Category Count
1 space 660
2 product type 469
3 shade family 522
4 space : living room 445
5 product type : shade 338
6 product type : appliances 232
7 So on …. ..
提前谢谢。
发布于 2020-05-24 21:37:54
一种方法是使用separate_rows
和separate
library(dplyr)
library(tidry)
df %>%
separate_rows(Filters,sep = " \\|\\| ") %>%
separate(Filters, into = c("Category","Subcategory")," : ") %>%
group_by(Category, Subcategory) %>%
summarise(Count = sum(Count)) %>%
bind_rows(group_by(.,Category) %>%
summarize(Count = sum(Count)) %>%
mutate(Subcategory = "Total")) %>%
separate_rows(Subcategory,sep = "\\|") %>%
arrange(-Count)
# A tibble: 13 x 3
# Groups: Category [3]
Category Subcategory Count
<chr> <chr> <int>
1 space Total 660
2 shade family Total 522
3 product type Total 469
4 space living room 445
5 shade family white 321
6 product type shade 237
7 space bedroom 215
8 product type appliances 131
9 shade family white 119
10 shade family black 119
11 product type shade 101
12 product type appliances 101
13 shade family yellows & greens 82
感谢来自@Bishops_Guest的this answer的一些启发。
https://stackoverflow.com/questions/61986465
复制相似问题