我被这个问题难住了。我有一个包含两列的起始数据集:一个ID和一个值。
df <- data.frame(id = c('ABC','XYZ'),
value = c(150, 300))
然后,我定义了如何“分层”这些值(在本例中,我希望将这些值分成100个层)。
cut <- seq(0, 300, 100)
因此,对于数据集的第一条记录,该值为150。我想把它分成0-100,100-200和200-300范围内的数量。
启动dataset
id value
ABC 150
XYZ 300
结束数据集(定义 cut
**)**之后
id value val_0_100 val_100_200 val_200_300
ABC 150 100 50 0
XYZ 300 100 100 100
发布于 2019-04-11 08:18:09
你可以这样做:
df <- data.frame(id = c('ABC','XYZ'),
value = c(150, 300))
initial_value = 0
final_value = 300
step = 100
number_of_columns = ceiling(final_value / step)
for (i in 1:number_of_columns){
new_col_name <- paste0("val_", step*(i-1), "_", step*i)
df[,new_col_name] = apply(df["value"] - (step*(i-1)),1, FUN=min,100)
df[,new_col_name] = apply(df[new_col_name],1, FUN=max,0)
}
发布于 2019-04-11 10:32:44
下面是使用data.table
和dcast
的另一种方法
library(data.table)
df <- data.frame(id = c('ABC','XYZ'),
value = c(160, 230))
# Data table
dt <- data.table(df)
# Append Data multiple times based on its value
dt <- dt[rep(seq_len(nrow(dt)), ceiling(dt$value/100)), ]
# cumulative sum to be used in splitting into columns in dcast
dt[, csum := 100]
dt[, csum := cumsum(csum), by = "id"]
# Adding extra column to split into 100s and remainder
dt[, value2 := 100]
dt[csum > value, value2 := value %% 100]
dt[value < 100, value2 := value]
dt_dcast <- dcast(dt, id + value ~ csum, value.var = "value2", fill = 0)
# Rename columns as per the example shown above
colstart <- seq(0, max(dt$csum) - 100, 100)
colend <- seq(100, max(dt$csum), 100)
newname <- c("id", "value", paste0("val_", colstart, "_", colend))
setnames(dt_dcast, names(dt_dcast), newname)
https://stackoverflow.com/questions/55619547
复制相似问题