我有两个不同信息的数据帧(只有一列在两个数据帧中是相同的:subject
)。而且,其中一个数据帧比另一个数据帧长(更多的列和更多的行)。
所以,我有这样的东西:
# drataframe 1
subject var1 var2
101 A B
102 C D
103 E F
...
# dataframe 2
subject varW varX varY varZ
101 1 2 1 4
101 2 1 1 4
101 4 1 1 4
102 2 1 2 5
102 1 1 2 5
102 2 4 2 5
103 2 3 3 1
103 1 2 3 1
103 4 1 3 1
注意,每个主题在dataframe 2中重复多次,而在dataframe 1中,每个主题只出现一次。
我想要做的是将列varY
和varZ
附加到DataFrame1。
我尝试使用select
(选择这两列),然后使用inner_join
(连接两个数据帧)。但是,每个主题有3行,因为每个主题在dataframe 2中重复3次。一旦连接,我希望dataframe 1中的每个主题只有1行,因为每个主题在varY
和varZ
中只有一个值。
换句话说,这将是我想要的输出:
subject var1 var2 varY varZ
101 A B 1 4
102 C D 2 5
103 E F 3 1
发布于 2021-04-21 08:32:13
如果主题ed列具有唯一值,则使用distinct
并通过‘select
’进行连接
library(dplyr)
distinct(df2, subject, varY, varZ) %>%
right_join(df1) %>%
select(names(df1), everything())
-output
# subject var1 var2 varY varZ
#1 101 A B 1 4
#2 102 C D 2 5
#3 103 E F 3 1
数据
df1 <- structure(list(subject = 101:103, var1 = c("A", "C", "E"),
var2 = c("B",
"D", "F")), class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(subject = c(101L, 101L, 101L, 102L, 102L, 102L,
103L, 103L, 103L), varW = c(1L, 2L, 4L, 2L, 1L, 2L, 2L, 1L, 4L
), varX = c(2L, 1L, 1L, 1L, 1L, 4L, 3L, 2L, 1L), varY = c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), varZ = c(4L, 4L, 4L, 5L, 5L,
5L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -9L
))
发布于 2021-04-21 08:36:05
使用library(data.table)
df1[df2[, .SD[1], .SDcols=c("varY", "varZ"), by=subject], on="subject"]
或者(来自@thelatemail的ty评论),
df2[, .(subject, varY, varZ)][df1, on=.(subject), mult="first"]
数据:
df1 = fread("
subject var1 var2
101 A B
102 C D
103 E F
")
df2 = fread("
subject varW varX varY varZ
101 1 2 1 4
101 2 1 1 4
101 4 1 1 4
102 2 1 2 5
102 1 1 2 5
102 2 4 2 5
103 2 3 3 1
103 1 2 3 1
103 4 1 3 1
")
发布于 2021-04-21 08:37:05
我希望这就是你想要的:
library(dplyr)
library(tidyr)
df2 %>%
left_join(df1, by = "subject") %>%
select(-c(varX, varW)) %>%
group_by(subject) %>%
slice_head(n = 1) %>%
relocate(subject, var1, var2)
# A tibble: 3 x 5
# Groups: subject [3]
subject var1 var2 varY varZ
<dbl> <chr> <chr> <dbl> <dbl>
1 101 A B 1 4
2 102 C D 2 5
3 103 E F 3 1
数据
df2 <- tribble(
~subject, ~varW, ~varX, ~varY, ~varZ,
101, 1, 2, 1, 4,
101, 2, 1, 1, 4,
101, 4, 1, 1, 4,
102, 2, 1, 2, 5,
102, 1, 1, 2, 5,
102, 2, 4, 2, 5,
103, 2, 3, 3, 1,
103, 1, 2, 3, 1,
103, 4, 1, 3, 1
)
df1 <- tribble(
~subject, ~var1, ~var2,
101, "A", "B",
102, "C", "D",
103, "E", "F"
)
https://stackoverflow.com/questions/67187846
复制相似问题