提供一个数据帧
x <- runif(1000, 0, 10)
y <- c(rep("success", 500), rep("failure", 500))
z <- data.frame(x, y)
是否可以生成类似于以下内容的直方图
ggplot(z, aes(x, fill = y)) + geom_histogram()
但是有了..count..归一化为
尝试=成功+失败
在每个bin中使用ggplot?非常感谢你的帮助。
编辑:非常感谢所有的回复!!对不起,我想我把问题简单化了。一个更接近我正在处理的数据的数据帧是
df <- data.frame(
v1 = runif(128000, 0, 10),
v2 = factor(rep(rep(1:5, c(1,10,8,4,2)), 5120)),
v3 = factor(rep(rep(1:12, c(2,4,4,6,6,6,6,6,6,6,6,6)), 2000)),
v4 = c(rep("success", 64000), rep("failure", 64000)))
只是数据不是均匀分布的。为了找到v1-v4之间的特定模式,我在视觉上探索这些数据,例如通过
ggplot(df, aes(v1, fill = v2)) +
geom_histogram(binwidth = 0.2, position = "stack") +
facet_wrap("v3")
和
ggplot(df %>% filter(v4 == "success"), aes(v1, fill = v2)) +
geom_histogram(binwidth = 0.2, position = "stack") +
facet_wrap("v3")
根据我到目前为止所看到的,我现在想通过规范化..count来进一步实现这一步。在最后的图中,即成功或失败,到每个bin中的尝试总数,即尝试=(成功+失败),以获得某种类型的频率图。例如,在v3 facet x,v2 v1 y,v1 bin z中,我希望看到0.25 ( 100次成功/ 400次尝试),而不是100次成功。
EDIT 2:我想要的图是这样的:
df <- df %>% mutate(v1_bins = cut(v1, breaks = 5))
df_successes <- df %>% group_by(v1_bins, v2, v3, v4) %>%
filter(v4 == "success") %>% summarise(successes = n()) %>%
ungroup() %>% select(-v4)
df_attempts <- df %>% group_by(v1_bins, v2, v3) %>%
summarise(attempts = n()) %>% ungroup()
df_freq <- left_join(df_attempts, df_successes, by = c("v1_bins", "v2", "v3")) %>%
mutate(success_freq = successes / attempts)
哪些情节
ggplot(df_freq, aes(x = v1_bins, y = success_freq, group = v2)) +
geom_col(aes(fill = v2), position = "identity", alpha = "0.5") +
facet_wrap("v3")
或
ggplot(df_freq, aes(x = v1_bins, y = success_freq, group = v2)) +
geom_line(aes(colour = v2)) +
facet_wrap("v3")
发布于 2018-06-05 00:18:15
我认为您希望在直方图的每个柱状图中获得成功和失败的比例。一种方法是在数据上创建切入点,然后使用position = "fill"
绘制条形图
z %>%
mutate(bins = cut(x, breaks = 30)) %>%
ggplot(aes(bins, fill = y)) +
geom_bar(position = "fill") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))
EDIT:根据您的编辑,您似乎正在尝试获取binned_v1
、v2
和v3
的每种组合的成功比例。从您的数据开始,下面的图表显示了这些内容。真的很忙。我将垃圾箱减少到10个,因为30个看起来太多了。
df <- data.frame(
v1 = runif(128000, 0, 10),
v2 = factor(rep(rep(1:5, c(1,10,8,4,2)), 5120)),
v3 = factor(rep(rep(1:12, c(2,4,4,6,6,6,6,6,6,6,6,6)), 2000)),
v4 = c(rep("success", 64000), rep("failure", 64000)))
df %>%
mutate(bins = cut(v1, breaks = 10)) %>%
group_by(bins, v2, v3) %>%
summarise(success_prop = mean(v4 == "success")) %>%
ggplot(aes(bins, success_prop, fill = v2)) +
geom_col(position = "dodge") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5)) +
facet_wrap(~ v3)
https://stackoverflow.com/questions/50684274
复制相似问题