我试图重新排列一个数据集,然后对多个变量进行排序。例如,现在我有这样的东西:
ID Name Class 1 Class2 Monday 7-8 Monday 8-9
1 Brad Chem Bio Monday 7-8 NA
2 Charlene Acct NA NA Monday 8-9
3 Carly Philosophy Physics NA NA
4 Jess Chem Acct Monday 7-8 Monday 8-9
并对数据排序如下:
Class Monday 7-8 Monday 8-9
Acct Jess Charlene, Jess
Bio Brad NA
Chem Brad, Jess Jess
Philosophy NA NA
Physics NA NA
我尝试过将所有变量分离到不同的电子表格中,然后将它们合并,但是我想不出如何根据类和时间对名称进行排序,而且事实证明这是非常困难的。实际的数据库由大约70个不同的时间选项组成,有80个不同的人,150个不同的类名(化学、生物等),所以我不能单独创建它。
发布于 2017-10-06 20:33:18
tidyr
解决方案:
df1 %>%
gather(class_col,Class,'Class.1','Class2') %>%
filter(!is.na(Class)) %>%
gather(date_col,date,'Monday.7.8','Monday.8.9') %>%
group_by(Class,date) %>%
summarize(Name = paste(Name,collapse=", ")) %>%
spread(date,Name) %>%
select(-`<NA>`)
# # A tibble: 5 x 3
# # Groups: Class [5]
# Class `Monday 7-8` `Monday 8-9`
# * <chr> <chr> <chr>
# 1 Acct Jess Charlene, Jess
# 2 Bio Brad <NA>
# 3 Chem Brad, Jess Jess
# 4 Philosophy <NA> <NA>
# 5 Physics <NA> <NA>
发布于 2017-10-06 16:48:09
以下是用于此任务的一些基本R代码:
dat <- data.frame(
name=c("Brad", "Charlene", "Carly", "Jess"),
class1=c("Chem", "Acct", "Philosophy", "Chem"),
class2=c("Bio", NA, "Physics", "Acct"),
monday7.8=c("monday7.8", NA, NA, "monday7.8"),
monday8.9=c(NA, "monday8.9", NA, "monday8.9"),
stringsAsFactors=FALSE
)
classes <- c("Chem", "Acct", "Philosophy", "Physics")
times <- c("monday7.8", "monday8.9")
ret <- expand.grid(class=classes, time=times, stringsAsFactors=FALSE)
one_alloc <- function(cl, tm, dat) {
idx <- which(!is.na(dat[,tm]) & (dat[,"class1"]==cl | dat[,"class2"]==cl))
if(length(idx)>0) return(paste(dat[idx,"name"], collapse=", ")) else return(NA)
}
one_alloc <- Vectorize(one_alloc, vectorize.args=c("cl", "tm"))
ret[,"names"] <- one_alloc(cl=ret[,"class"], tm=ret[,"time"], dat=dat)
ret <- reshape(ret, timevar="time", idvar="class", direction="wide")
ret
https://stackoverflow.com/questions/46609607
复制相似问题