对于给定的数据集,我希望将我的数据集从长格式转换为宽格式。我使用了重塑函数来做到这一点。
id status timestamp
1 assigned 2017-01-02
1 done 2017-01-03
1 locked 2017-01-04
2 assigned 2017-01-02
2 done 2017-01-03
2 assigned 2017-01-03
2 done 2017-01-04
2 locked 2017-01-05
3 assigned 2017-01-02
3 done 2017-01-03
3 locked 2017-01-04
...
# reshape function to convert long format to Wide.
temp <- reshape(temp, idvar = "id", timevar = "status", direction = "wide")结果:
id timestamp.assigned timestamp.done timestamp.locked
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04
当我这样做时,它删除了一些行,例如:对于id 2,有多行匹配status=assigned,它取第一行。
如何在不删除行的情况下转换为宽。基本上,我不想丢失任何数据。
预期结果:
id timestamp.assigned timestamp.done timestamp.locked
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 2017-01-05
2 2017-01-03 2017-01-04 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04
或
id timestamp.assigned timestamp.done timestamp.locked
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 NA
2 2017-01-03 2017-01-04 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04
发布于 2018-07-20 16:43:34
1. cumsum()
使用Esther's approach对每个新赋值进行编号是可行的。
但是,R已经具有可用于此目的的cumsum()函数:
temp$key <- cumsum(temp$status == "assigned")
reshape(temp, idvar = c("key", "id"), timevar = "status", direction = "wide")id key timestamp.assigned timestamp.done timestamp.locked 1: 1 1 2017-01-02 2017-01-03 2017-01-04 2: 2 2 2017-01-02 2017-01-03 <NA> 3: 2 3 2017-01-03 2017-01-04 2017-01-05 4: 3 4 2017-01-02 2017-01-03 2017-01-04
2.分组cumsum()
虽然这解决了OP的原始问题,但key只对所有id上的所有赋值进行编号。如果OP希望为每个id单独对赋值进行编号,我们需要应用按cumsum()分组的id。
实现这一点的一种方法是使用data.table语法:
library(data.table)
setDT(temp)[, key := cumsum(status == "assigned"), by = id]
dcast(temp, id + key ~ status, value.var = "timestamp")id key assigned done locked 1: 1 1 2017-01-02 2017-01-03 2017-01-04 2: 2 1 2017-01-02 2017-01-03 <NA> 3: 2 2 2017-01-03 2017-01-04 2017-01-05 4: 3 1 2017-01-02 2017-01-03 2017-01-04
dcast()是base R的reshape(..., direction = "wide")函数的替代,该函数可从reshape2和data.table包中获得。
3.动态分组cumsum()
data.table的dcast()的公式接口也接受表达式,这样就不需要在整形前通过添加key列来修改temp,而是可以在整形的同时动态完成:
dcast(temp, id + ave(key <- status == "assigned", id, FUN = cumsum) ~
paste0("timestamp.", status))id key timestamp.assigned timestamp.done timestamp.locked 1: 1 1 2017-01-02 2017-01-03 2017-01-04 2: 2 1 2017-01-02 2017-01-03 <NA> 3: 2 2 2017-01-03 2017-01-04 2017-01-05 4: 3 1 2017-01-02 2017-01-03 2017-01-04
数据
library(data.table)
temp <- fread(
"id status timestamp
1 assigned 2017-01-02
1 done 2017-01-03
1 locked 2017-01-04
2 assigned 2017-01-02
2 done 2017-01-03
2 assigned 2017-01-03
2 done 2017-01-04
2 locked 2017-01-05
3 assigned 2017-01-02
3 done 2017-01-03
3 locked 2017-01-04 ")发布于 2018-06-06 01:42:23
您可以做的一件事是添加一个变量,该变量为每个新赋值提供唯一的值。然后你可以用它来整形你的数据
i <- 0
temp$key <- sapply(temp$status, function(x) {
if(x == "assigned") {i <<- i+1; i}
else {i}
})
temp
id status timestamp key
1 1 assigned 2017-01-02 1
2 1 done 2017-01-03 1
3 1 locked 2017-01-04 1
4 2 assigned 2017-01-02 2
5 2 done 2017-01-03 2
6 2 assigned 2017-01-03 3
7 2 done 2017-01-04 3
8 2 locked 2017-01-05 3
9 3 assigned 2017-01-02 4
10 3 done 2017-01-03 4
11 3 locked 2017-01-04 4
temp2 <- reshape(temp, idvar = c("key", "id"), timevar = "status", direction = "wide")
temp2
id key timestamp.assigned timestamp.done timestamp.locked
1 1 1 2017-01-02 2017-01-03 2017-01-04
4 2 2 2017-01-02 2017-01-03 <NA>
6 2 3 2017-01-03 2017-01-04 2017-01-05
9 3 4 2017-01-02 2017-01-03 2017-01-04https://stackoverflow.com/questions/50676344
复制相似问题