我是新手,如果能帮上忙我会很感激的。基本上,我想创建一个输出csv文件,其中包含每次爆发的频率和首次爆发日期、最后一次爆发日期和总持续时间。
我有一个如下所示的数据集:
df <- data.frame(outbreak_name = c("A","A","A","A","B","B","C","C","C"), onset = c(as.Date("2021-1-11"), "2021-2-2","2021-2-3","2021-3-3","2021-5-5","2021-7-5","2021-4-5","2021-2-3","2021-12-4"))我已经能够创建具有如下日期的列
summary_ob <- df %>%
  group_by(outbreak_name) %>%
  mutate(first_onset = min(onset)) %>%
  mutate(last_onset = max(onset)) %>%
  mutate(duration = last_onset - first_onset) 我可以用一个简单的计数创建一个频率表。
summary_freq <- df %>%
  group_by(outbreak_name) %>%
  summarize(cases = n())我不明白的是如何组合,所以它会显示爆发A有4个病例,第一次发病是xx,最后一次发病是xx,疫情已经持续了xx天。然后我想把这个作为输出write.csv。
发布于 2021-09-23 01:20:36
library(dplyr)
   df %>%
      group_by(outbreak_name) %>%
      summarize(
        cases = n(),
        first_onset = min(onset),
        last_onset = max(onset)
        ) %>% 
      mutate(duration = last_onset - first_onset) 
# A tibble: 3 x 5
  outbreak_name cases first_onset last_onset duration
  <chr>         <int> <date>      <date>     <drtn>  
1 A                 4 2021-01-11  2021-03-03  51 days
2 B                 2 2021-05-05  2021-07-05  61 days
3 C                 3 2021-02-03  2021-12-04 304 days之后,您可以使用write_csv导出。
发布于 2021-09-23 01:32:37
我们可以在‘range’的range上用diff来做这件事
library(dplyr)
df %>%
    group_by(outbreak_name) %>%
    summarise(cases = n(), duration = diff(range(onset)))-output
# A tibble: 3 x 3
  outbreak_name cases duration
  <chr>         <int> <drtn>  
1 A                 4  51 days
2 B                 2  61 days
3 C                 3 304 dayshttps://stackoverflow.com/questions/69292909
复制相似问题