我正在尝试将一些数据转换为具有二进制代码的数据表(稍后用于集群)。我得到的数据如下所示:
order_number product_id
34 37552
5 24852
10 24852
15 33290
7 23586
35 22395
4 16766
33 46393
9 12916
61 12341 我希望得到的是列order_number,使其看起来像一个行标题,然后根据来自product_id的某个产品是否在order_number单元格中粘贴0或1。所以order_number应该像一个篮子。我希望它看起来像这样:
order_number
product_id 34 5
37552 1 0
24852 0 1
24852 0 1 有人知道怎么做吗?任何帮助都会非常感谢,我被困住了。
发布于 2017-08-15 03:51:39
一个简单的table怎么样
> table(df$product_id, df$order_number, dnn=c("Product ID","Order Number"))
## Order Number
## Product ID 4 5 7 9 10 15 33 34 35 61
## 12341 0 0 0 0 0 0 0 0 0 1
## 12916 0 0 0 1 0 0 0 0 0 0
## 16766 1 0 0 0 0 0 0 0 0 0
## 22395 0 0 0 0 0 0 0 0 1 0
## 23586 0 0 1 0 0 0 0 0 0 0
## 24852 0 1 0 0 1 0 0 0 0 0
## 33290 0 0 0 0 0 1 0 0 0 0
## 37552 0 0 0 0 0 0 0 1 0 0
## 46393 0 0 0 0 0 0 1 0 0 0发布于 2017-08-15 04:08:50
我不确定为什么你需要product_id的副本,但是这将会给你你在问题中给出的确切的期望输出;然而,它并不干净,因为期望的输出有点奇怪;
df.out <- df.org
df.out[as.character(df.out$order_number)] <- 0
df.out <- rbind(df.out ,c("NA", "NA",df.out$product_id))
for (i in 1:(nrow(df.out)-1)){
for(j in 3:ncol(df.out)){
df.out[i,j] <- ifelse(df.out[11,j]==df.out[i,2],1,df.out[i,j])
}
}
df.out <- df.out[-11,-1]
df.out
# product_id 34 5 10 15 7 35 4 33 9 61
# 1 37552 1 0 0 0 0 0 0 0 0 0
# 2 24852 0 1 1 0 0 0 0 0 0 0
# 3 24852 0 1 1 0 0 0 0 0 0 0
# 4 33290 0 0 0 1 0 0 0 0 0 0
# 5 23586 0 0 0 0 1 0 0 0 0 0
# 6 22395 0 0 0 0 0 1 0 0 0 0
# 7 16766 0 0 0 0 0 0 1 0 0 0
# 8 46393 0 0 0 0 0 0 0 1 0 0
# 9 12916 0 0 0 0 0 0 0 0 1 0
# 10 12341 0 0 0 0 0 0 0 0 0 1数据:
df.org <- structure(list(order_number = c(34L, 5L, 10L, 15L, 7L, 35L, 4L,
33L, 9L, 61L), product_id = c(37552L, 24852L, 24852L, 33290L,
23586L, 22395L, 16766L, 46393L, 12916L, 12341L)), .Names = c("order_number",
"product_id"), class = "data.frame", row.names = c(NA, -10L))https://stackoverflow.com/questions/45681094
复制相似问题