我想为另一个数值变量的每个百分位数找到一个数值变量的平均值。从本质上复制这个图表(Marian等人(2012),但对于我自己的数据:
我尝试了以下几点:
tapply(quantile(CLEARPOND$word_frequency, probs = c(.05, .10, .15, .20, .25,.30,.35,.40,.45,.50,.55,.60,.65,.70,.75,.80,.85,.90,.95)), CLEARPOND$Colthearts_N, mean)
返回以下错误:
Error in tapply(quantile(CLEARPOND$word_frequency, probs = c(0.5, 0.1, : arguments must have same length
是否有一种更合乎逻辑的方法来解决这个问题?
我基本上想把变量word_frequency分成5%增量的回收箱。然后找出每个垃圾箱的Colthearts_N平均值。理想情况下,我也希望在散点图上画这个图。
我的word_frequency百分位数如下:
5% 10% 15% 20% 25% 30% 35% 40% 45 50% 50% 60% 65% 70% 80% 85% 95% 1 1 1 2 2 3 4 6 11
如能提供任何帮助,将不胜感激。
发布于 2022-09-07 16:31:22
我发明了数据来测试解决方案,我相信它达到了你的目标。
set.seed(42)
CLEARPOND <- data.frame(
word_frequency = rnorm(1000),
Colthearts_N = sample(1:100,
size = 1000, replace = TRUE
)
) %>% arrange(
word_frequency
)
mutate(CLEARPOND,
bin = cut(
x = word_frequency,
breaks = c(
-Inf, quantile(word_frequency,
probs = seq(from = 0.05, to = .95, by = .05)
),
Inf
)
)
) |>
group_by(bin) |>
summarise(avg = mean(Colthearts_N),
n= n())https://stackoverflow.com/questions/73638275
复制相似问题