文章/答案/技术大牛

发布

社区首页 >问答首页 >如何比较数据集与ggplot2 geom_density()

问如何比较数据集与ggplot2 geom_density()
EN

Stack Overflow用户

提问于 2021-10-27 11:52:13

回答 1查看 214关注 0票数 0

这是我先前提出的问题的延伸：

How to extract the density value from ggplot in r

这个水果数据集实际上是国家A的数据，现在我有了另一个国家B的数据集，我想比较它们的值。但是，A国和B国苹果的密度图(y轴)不同，A国的密度最高，在0.8左右，B国的密度在0.4左右。

例子A国：

Q.乡村B具有相似的曲线，但其y轴的最高密度值仅为0.4.那么，我如何比较它们呢？

最小示例代码：

library(ggplot2) 
set.seed(1234) 
df = data.frame(
    fruits = factor(rep(c("Orange", "Apple", "Pears", "Banana"), each = 200)),
    weight = round(c(rnorm(200, mean = 55, sd=5),
                     rnorm(200, mean=65, sd=5),
                     rnorm(200, mean=70, sd=5),
                     rnorm(200, mean=75, sd=5)))
) 

dim(df) #[1] 800   2
    
ggplot(df, aes(x = weight)) + 
  geom_density() + 
  facet_grid(fruits ~ ., scales = "free", space = "free")
    
g = ggplot(df, aes(x = weight)) + 
  geom_density() + 
  facet_grid(fruits ~ ., scales = "free", space = "free")
    
p = ggplot_build(g)
    
sp = split(p$data[[1]][c("x", "density")], p$data[[1]]$PANEL)
apple_df = sp[[1]]
    
sum(apple_df$density ) # this is equal to 10.43877 but i want it to be one

ggplot2

probability-density

回答 1

Stack Overflow用户

发布于 2021-10-27 12:27:53

假设您有两个不同国家( df_c1和df_c2 )的数据格式。其想法是合并这两个数据格式，并添加一个列来区分国家。

library(dplyr)
library(ggplot2)

df_c1 = data.frame(
  fruits = factor(rep(c("Orange", "Apple", "Pears", "Banana"), each = 200)),   
  weight = round(c(rnorm(200, mean = 55, sd=5),
                   rnorm(200, mean=65, sd=5), 
                   rnorm(200, mean=70, sd=5), 
                   rnorm(200, mean=75, sd=5)))
)

df_c2 = data.frame(
  fruits = factor(rep(c("Orange", "Apple", "Pears", "Banana"), each = 200)),   
  weight = round(c(rnorm(200, mean = 20, sd=3),
                   rnorm(200, mean=35, sd=6), 
                   rnorm(200, mean=40, sd=2), 
                   rnorm(200, mean=15, sd=4)))
)


df <- rbind(
  df_c1 %>% mutate(country = "country 1"), 
  df_c2 %>% mutate(country = "country 2")
)


df %>% 
  ggplot() + 
  geom_density(aes(x = weight, color = country)) +
  facet_grid(fruits ~ ., scales = "free", space = "free")

曲线下面积

使用发行版的另一种可能是首先使用density函数，然后表示这些值。

dens1 <- df_c1 %>% 
  group_by(fruits) %>% 
  summarise(x = density(weight)$x, y = density(weight)$y) %>% 
  mutate(country = "country 1")

dens2 <- df_c2 %>% 
  group_by(fruits) %>% 
  summarise(x = density(weight)$x, y = density(weight)$y) %>% 
  mutate(country = "country 2")

df_dens <- rbind(dens1, dens2)

现在在ggplot中我们使用geom_line

df_dens %>% 
  ggplot() +
  geom_line(aes(x, y, color = country)) + 
  facet_grid(fruits ~ ., scales = "free", space = "free")

如果要测量曲线下的面积，请定义微分。

我们只选择一条曲线，例如country == "country 1和fruits == "Apple"

df_single_curve <- df_dens %>% 
  filter(country == "country 1" & fruits == "Apple")

# differential
xx <- df_single_curve$x
dx <- xx[2L] - xx[1L]
yy <- df_single_curve$y

# integral
I <- sum(yy) * dx
I
# [1] 1.000965

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69738153

复制

相似问题

问如何比较数据集与ggplot2 geom_density()
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何比较数据集与ggplot2 geom_density()EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何比较数据集与ggplot2 geom_density()
EN