文章/答案/技术大牛

发布

社区首页 >问答首页 >多汇总统计在dplyr分析中的结合

问多汇总统计在dplyr分析中的结合
EN

Stack Overflow用户

提问于 2018-08-22 13:48:11

回答 1查看 94关注 0票数 0

对于一个样本数据：

df1 <- structure(list(practice = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), drug = c("123A456", 
"123A567", "123A123", "123A567", "123A456", "123A123", "123A567", 
"123A567", "998A125", "123A456", "998A125", "123A567", "123A456", 
"998A125", "123A567", "123A567", "123A567", "998A125", "123A123", 
"998A125", "123A123", "123A456", "998A125", "123A567", "998A125", 
"123A456", "123A123", "998A125", "123A567", "123A567", "998A125", 
"123A456", "123A123", "123A567", "123A567", "998A125", "123A456"
), items = c(1, 2, 3, 4, 5, 4, 6, 7, 8, 9, 5, 6, 7, 8, 9, 4, 
5, 6, 3, 2, 3, 4, 5, 6, 7, 4, 3, 2, 3, 4, 5, 4, 3, 4, 5, 6, 4
), quantity = c(1, 2, 4, 5, 3, 2, 3, 5, 4, 5, 7, 9, 5, 3, 4, 
6, 1, 2, 4, 5, 3, 2, 3, 5, 4, 5, 7, 9, 5, 3, 4, 6, 1, 2, 4, 5, 
3)), .Names = c("practice", "drug", "items", "quantity"), row.names = c(NA, 
-37L), spec = structure(list(cols = structure(list(practice = structure(list(), class = c("collector_integer", 
"collector")), drug = structure(list(), class = c("collector_character", 
"collector")), items = structure(list(), class = c("collector_integer", 
"collector")), quantity = structure(list(), class = c("collector_integer", 
"collector"))), .Names = c("practice", "drug", "items", "quantity"
)), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"), class = c("tbl_df", 
"tbl", "data.frame"))

我想做各种分析。我认为dplyr将是我的解决方案，但我正在努力如何将该功能组合在一起。

我的数据是一份药品清单，我想总结一下其中的一些药物(由药物代码的前三位数来定义)。

我想报告那些类型的药物(从123开始)的总和-药品123，项目和药物123，实践中的数量。
我还想报告数据中所有药物(all_items和all_quantity)的总数(最终将drug123表示为所有药物的百分比)。

我可以单独做一些分析，例如，用它来概括项目总数：

practice <- df1 %>% 
  group_by(practice) %>% 
  summarise(all.items = sum(items))

..。我只想看看我感兴趣的药物.

drug123 <- df1 %>% 
  filter(substr(drug, 1,3)==123)


ALL.drug123 <- aggregate(drug123$quantity, by=list(Category=drug123$practice), FUN=sum)

但我怎么把所有的东西都整理好呢？

我想要一个包含以下列的dataframe：

实践(1,2,3在所提供的数据中)。

药品123. drug123项目#

药品123. drug123的数量#

所有药物的all.items #

所有药物的all.quantity #

有什么想法吗？

dplyr

plyr

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-08-22 14:51:16

我想这就是你要找的：

df1 %>%
  group_by(practice) %>%
  summarize(items_123 = sum(if_else(stringr::str_detect(drug, '^123'), items, 0)),
            quantity_123 = sum(if_else(stringr::str_detect(drug, '^123'), quantity, 0)),
            all_items = sum(items),
            all_quantity = sum(quantity))

# A tibble: 3 x 5
  practice items_123 quantity_123 all_items all_quantity
     <int>     <dbl>        <dbl>     <dbl>        <dbl>
1        1        54           44        75           58
2        2        44           42        66           65
3        3        24           19        35           28

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51968384

复制

相似问题

问多汇总统计在dplyr分析中的结合
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问多汇总统计在dplyr分析中的结合EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问多汇总统计在dplyr分析中的结合
EN