我的数据有22个变量。这是一个简化的样本。变量包括x1、x2、y1_、y2_。我想要创建一个新的变量。变量值为x1*y1_+x2*y2_。守则如下:
df <- data.frame(x1=c(0,0,0,1),x2=c(0,0,0,1),y1_=c(3,0,2,1),y2_=c(1,0,0,1))
df$var <- df$x1*df$y1_+df$x2*df$y2_如果没有。变量为22,以上代码不合理。那么,如何得到这个变量呢?
发布于 2022-11-18 14:46:54
根据列名拆分数据,然后乘,然后逐行求和:
x <- colnames(df)
df$var <- rowSums(df[, grepl("^x", x)] * df[, grepl("^y", x)])
df
# x1 x2 y1_ y2_ var
# 1 0 0 3 1 0
# 2 0 0 0 0 0
# 3 0 0 2 0 0
# 4 1 1 1 1 2发布于 2022-11-18 14:44:17
底线:
df$var <- do.call(`+`,
lapply(split.default(df, gsub(".*([0-9]+)_?$", "\\1", names(df))),
function(z) apply(z, 1, prod)))
df
# x1 x2 y1_ y2_ var
# 1 0 0 3 1 0
# 2 0 0 0 0 0
# 3 0 0 2 0 0
# 4 1 1 1 1 2初步步骤:
gsub(".*([0-9]+)_?$", "\\1", names(df))
# [1] "1" "2" "1" "2"
split.default(df, gsub(".*([0-9]+)_?$", "\\1", names(df)))
# $`1`
# x1 y1_
# 1 0 3
# 2 0 0
# 3 0 2
# 4 1 1
# $`2`
# x2 y2_
# 1 0 1
# 2 0 0
# 3 0 0
# 4 1 1
lapply(split.default(df, gsub(".*([0-9]+)_?$", "\\1", names(df))),
function(z) apply(z, 1, prod))
# $`1`
# [1] 0 0 0 1
# $`2`
# [1] 0 0 0 1发布于 2022-11-18 15:12:04
1)使用dplyr,并假设以数字结尾的列与以_结尾的列的顺序相同,并且两个组分别以数字和下划线结尾,我们可以这样使用across。
library(dplyr)
df %>% mutate(var = rowSums(across(matches("\\d$")) * across(ends_with("_"))))给予
x1 x2 y1_ y2_ var
1 0 0 3 1 0
2 0 0 0 0 0
3 0 0 2 0 0
4 1 1 1 1 22) A变体是按行方式使用的:
df %>%
rowwise %>%
mutate(var = sum(c_across(matches("\\d$")) * c_across(ends_with("_")))) %>%
ungroup备注
df <- structure(list(x1 = c(0, 0, 0, 1), x2 = c(0, 0, 0, 1), y1_ = c(3,
0, 2, 1), y2_ = c(1, 0, 0, 1)), class = "data.frame", row.names = c(NA,
-4L))
df
## x1 x2 y1_ y2_
## 1 0 0 3 1
## 2 0 0 0 0
## 3 0 0 2 0
## 4 1 1 1 1https://stackoverflow.com/questions/74491279
复制相似问题