在一组类别中,我试图计算一个真实数据集中有多少值在一组模拟值中,我想不出如何为它编写r代码,尽管我一直在使用dplyr。
Example:
category <- c(1,1,1,1,2,2,2,2,3,3,3,3)
dist <- c(50,20,50,50,70,70,50,50,50,50,70,70)
type <- c("real", "sim", "sim","sim", "real", "sim",
"sim","sim","real", "sim", "sim","sim")
df <- data.frame(category,dist,type)
df
category dist type
1 50 real
1 20 sim
1 50 sim
1 50 sim
2 70 real
2 70 sim
2 50 sim
2 50 sim
3 50 real
3 50 sim
3 70 sim
3 70 sim
What I want:
category count
1 2
2 1
3 1
发布于 2019-09-16 17:13:07
上面的答案不是为我做的,但这可能是我在措辞上的错误。如果其他人有同样的问题,我发现下面的解决方案对我有效。
#getting values of a row
sim <- subset(both, Type == "sim")
snake <- subset(both, Type == "real")
snake <- snake %>% slice(rep(1:n(), 1000)) ##1000 was the number of total simulated
animals I had
count <- ifelse(snake$dist >= sim$dist, 1, ifelse(snake$dist < sim$dist,0,NA))
count <- as.data.frame(count)
count <- cbind(count, sim$category)
colnames(count) <- c("binary", "category")
head(count)
totalB <- aggregate(binary~category, count, FUN = sum)
names(totalB)[2] <- 'total'
head(totalB)
发布于 2019-08-28 14:45:35
一个选项是group_by
‘分类’和summarise
,方法是检查'type‘sim的'dist’值小于'real‘类型的’
library(dplyr)
df %>%
group_by(category) %>%
summarise(count = sum(unique(dist[type == 'sim']) <= dist[type == 'real'][1]))
发布于 2019-08-28 15:40:52
reshape2
的一种方法,
library(reshape2)
df2 <- as.data.frame(table(df))
my_cast <- dcast(df2,type~ category+dist,value.var="Freq")
col = apply(my_cast, 2, function(col) all(col !=0 ))
as.data.frame(t(my_cast[,col][2,]))
# type sim
# 1_50 2
# 2_70 1
# 3_50 1
https://stackoverflow.com/questions/57694976
复制相似问题