我有一个数据表,每天有多个土壤测量数据。土壤湿度从0到0.8不等,也有一些土壤湿度:
set.seed(24)
df1 <- data.frame(date = sample(seq(as.Date("2015-01-01"),
length.out = 365, by = "1 day"), 5e1, replace = TRUE),
sm = sample(c(NA, runif(10, min=0, max=0.8)), 5e1, replace = TRUE))我正努力按每个月计算下列统计数字:
0 to 0.2、0.2 to 0.4、0.4 to 0.6和0.6 to 0.8)。在提供的示例df1中,一月份有五个度量。每五个人中就有一个是NA,因此NA应该占20%。还有0.13,它适合0-0.2类。因此,有20%。有两个0.23值,在0.2-0.4类中,因此50%。最后一个0.68值属于0.6-0.8类,占一月份总数的20%。
这是预期的结果:
month NA 0-0.2 0.2-0.4 0.4-0.6 0.6-0.8
1 20% 20% 40% 0% 20%
2 0% 0% 50% 25% 25%
3 0% 0% 16.6% 16.6% 66.8%
...我试图计算1.的失败尝试如下:
DT[, .(percentage = 100 * sum(is.na(.SD))/length(.SD)), by=month(DT$date)]但它产生了一些无意义的百分比值。
对怎么去那里有什么想法吗?谢谢!
发布于 2018-02-07 06:14:22
我们可以试试tidyverse。将'date‘转换为Date类(如果还没有),从'date’中提取month,根据'sm‘列创建一个带有cut的分组变量,按’月份‘和'grp’分组,得到每个组(n())的元素数,除以每个月的总行数,并将其设置为'wide‘格式。
library(tidyverse)
df1 %>%
group_by(month = month(date)) %>%
mutate(n = n()) %>%
group_by(grp = cut(sm, breaks = seq(0, 0.8, by = 0.2)), add = TRUE) %>%
summarise(perc = 100 * n()/first(n)) %>%
spread(grp, perc, fill = 0)
# A tibble: 12 x 6
# Groups: month [12]
# month `(0,0.2]` `(0.2,0.4]` `(0.4,0.6]` `(0.6,0.8]` `<NA>`
# * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1.00 20.0 40.0 0 20.0 20.0
# 2 2.00 0 50.0 25.0 25.0 0
# 3 3.00 0 16.7 16.7 66.7 0
# 4 4.00 14.3 42.9 42.9 0 0
# 5 5.00 33.3 16.7 0 50.0 0
# 6 6.00 0 100 0 0 0
# 7 7.00 0 66.7 0 0 33.3
# 8 8.00 20.0 60.0 20.0 0 0
# 9 9.00 14.3 28.6 28.6 14.3 14.3
#10 10.0 50.0 50.0 0 0 0
#11 11.0 0 100 0 0 0
#12 12.0 0 33.3 66.7 0 0 或者使用data.table
library(data.table)
tmp <- setDT(df1)[, n := .N, month(ymd(date))][, .(perc = 100 * .N/n[1]),
by = .(month = month(ymd(date)),
grp = cut(sm, breaks = seq(0, 0.8, by = 0.2),
labels = c('0 - 0.2', '0.2 - 0.4', '0.4 - 0.6', '0.6 - 0.8')))]
dcast(tmp, month ~ grp, value.var = 'perc')数据
set.seed(24)
df1 <- data.frame(date = sample(seq(as.Date("2015-01-01"),
length.out = 365, by = "1 day"), 3e4, replace = TRUE),
sm = sample(c(NA, rnorm(10)), 3e4, replace = TRUE))https://stackoverflow.com/questions/48657034
复制相似问题