这是我先前提出的问题的延伸:
How to extract the density value from ggplot in r
这个水果数据集实际上是国家A的数据,现在我有了另一个国家B的数据集,我想比较它们的值。但是,A国和B国苹果的密度图(y轴)不同,A国的密度最高,在0.8左右,B国的密度在0.4左右。
例子A国:
Q.乡村B具有相似的曲线,但其y轴的最高密度值仅为0.4.那么,我如何比较它们呢?
最小示例代码:
library(ggplot2)
set.seed(1234)
df = data.frame(
fruits = factor(rep(c("Orange", "Apple", "Pears", "Banana"), each = 200)),
weight = round(c(rnorm(200, mean = 55, sd=5),
rnorm(200, mean=65, sd=5),
rnorm(200, mean=70, sd=5),
rnorm(200, mean=75, sd=5)))
)
dim(df) #[1] 800 2
ggplot(df, aes(x = weight)) +
geom_density() +
facet_grid(fruits ~ ., scales = "free", space = "free")
g = ggplot(df, aes(x = weight)) +
geom_density() +
facet_grid(fruits ~ ., scales = "free", space = "free")
p = ggplot_build(g)
sp = split(p$data[[1]][c("x", "density")], p$data[[1]]$PANEL)
apple_df = sp[[1]]
sum(apple_df$density ) # this is equal to 10.43877 but i want it to be one
发布于 2021-10-27 12:27:53
假设您有两个不同国家( df_c1
和df_c2
)的数据格式。其想法是合并这两个数据格式,并添加一个列来区分国家。
library(dplyr)
library(ggplot2)
df_c1 = data.frame(
fruits = factor(rep(c("Orange", "Apple", "Pears", "Banana"), each = 200)),
weight = round(c(rnorm(200, mean = 55, sd=5),
rnorm(200, mean=65, sd=5),
rnorm(200, mean=70, sd=5),
rnorm(200, mean=75, sd=5)))
)
df_c2 = data.frame(
fruits = factor(rep(c("Orange", "Apple", "Pears", "Banana"), each = 200)),
weight = round(c(rnorm(200, mean = 20, sd=3),
rnorm(200, mean=35, sd=6),
rnorm(200, mean=40, sd=2),
rnorm(200, mean=15, sd=4)))
)
df <- rbind(
df_c1 %>% mutate(country = "country 1"),
df_c2 %>% mutate(country = "country 2")
)
df %>%
ggplot() +
geom_density(aes(x = weight, color = country)) +
facet_grid(fruits ~ ., scales = "free", space = "free")
曲线下面积
使用发行版的另一种可能是首先使用density
函数,然后表示这些值。
dens1 <- df_c1 %>%
group_by(fruits) %>%
summarise(x = density(weight)$x, y = density(weight)$y) %>%
mutate(country = "country 1")
dens2 <- df_c2 %>%
group_by(fruits) %>%
summarise(x = density(weight)$x, y = density(weight)$y) %>%
mutate(country = "country 2")
df_dens <- rbind(dens1, dens2)
现在在ggplot
中我们使用geom_line
df_dens %>%
ggplot() +
geom_line(aes(x, y, color = country)) +
facet_grid(fruits ~ ., scales = "free", space = "free")
如果要测量曲线下的面积,请定义微分。
我们只选择一条曲线,例如country == "country 1
和fruits == "Apple"
df_single_curve <- df_dens %>%
filter(country == "country 1" & fruits == "Apple")
# differential
xx <- df_single_curve$x
dx <- xx[2L] - xx[1L]
yy <- df_single_curve$y
# integral
I <- sum(yy) * dx
I
# [1] 1.000965
https://stackoverflow.com/questions/69738153
复制相似问题