我有一个数据集,我试图通过计算一个类别来只选择前n个,然后使用数据集中的其他变量进行绘图--基本上是前n个类别的一级聚合,但需要返回到完整的数据才能在ggplot
中绘图。
因此,在下面的问题中,我想要两个最常见的计数,然后绘制并按year
计数对它们进行facetwrap
。
ap <-
tribble(
~year, ~examName,
2014, "Statistics",
2015, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2013, "Macroeconomics",
2013, "Macroeconomics",
2014, "Macroeconomics",
2015, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2013, "Calculus",
2014, "Calculus",
2015, "Calculus",
2016, "Calculus",
2017, "Calculus",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2018, "Psychology",
2018, "Psychology")
ap_top <- ap %>%
count(examName, sort = TRUE) %>%
head(2) %>%
inner_join(ap, by = "examName") %>%
select(-n)
ap_top %>%
count(examName, year) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
我的想法是获取前n,然后在原始数据集上执行inner_join
。然后使用它进行绘图;本质上是使用内部连接作为过滤器。
我知道有一种更好的方法可以做到这一点,我希望有一个更优雅的解决方案!我在认真地听呢!给出了示例数据集(对不起,它太长了)。
发布于 2019-01-19 04:21:18
您不需要inner_join()
,我只需要在单独的语句中确定前两个考试,然后对它们进行过滤。
top_exams <- count(ap, examName) %>%
top_n(2, n) %>% pull(examName)
ap %>%
filter(examName %in% top_exams) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
发布于 2019-01-19 04:43:11
另一种可能性是:
ap %>%
group_by(examName) %>%
mutate(temp = n()) %>%
ungroup() %>%
mutate(temp = dense_rank(desc(temp))) %>%
filter(temp %in% c(1,2)) %>%
select(-temp) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
它对每个"examName“的案例进行计数,并对计数进行排序。然后,它过滤具有最大计数和第二大计数的案例。
https://stackoverflow.com/questions/54260789
复制相似问题