我想在左联接的所有列中添加一个前缀.
当两个连接表之间的名称相同时,left_join()可以添加后缀。但是,它没有一个选项总是添加这个后缀,即使它们没有相同的名称。而且它没有一个选项来代替添加前缀。
library(dplyr)
library(nycflights13)
flights2 <- flights %>% select(year:day, hour, origin, dest, tailnum, carrier)
airports2 <- airports
result <- flights2 %>% left_join(airports2, c("dest" = "faa")) %>% head()结果:
Source: local data frame [6 x 14]
year month day hour origin dest tailnum carrier name
(int) (int) (int) (dbl) (chr) (chr) (chr) (chr) (chr)
1 2013 1 1 5 EWR IAH N14228 UA George Bush Intercontinental
2 2013 1 1 5 LGA IAH N24211 UA George Bush Intercontinental
3 2013 1 1 5 JFK MIA N619AA AA Miami Intl
4 2013 1 1 5 JFK BQN N804JB B6 NA
5 2013 1 1 5 LGA ATL N668DN DL Hartsfield Jackson Atlanta Intl
6 2013 1 1 5 EWR ORD N39463 UA Chicago Ohare Intl
Variables not shown: lat (dbl), lon (dbl), alt (int), tz (dbl), dst (chr)在这里,不可能只从联接结果中知道每个列都来自哪个原始表。
添加此前缀的目的是根据表名和从关系数据库加载的数据的列名可靠地计算列名。例如,加载到R中并存储在关系数据库和命名约定中的数据库结构将用于标识主键和外键。然后,这些将用于设置联接,并在以后从联接结果中检索数据。
我在mySQL上发现了一个类似的问题,但对于R却没有:
In a join, how to prefix all column names with the table it came from
发布于 2016-10-24 20:01:17
实现这一目标的一个简单方法是在执行联接之前将前缀添加到原始表中:
# add prefix before joining:
names(flights2) <- paste0("flights2.", names(flights2) )
names(airports2) <- paste0("airports2.", names(airports2) )
# in join, use names with prefixes
result <- flights2 %>% left_join(airports2, c("flights2.dest" = "airports2.faa") ) %>% head()结果:
Source: local data frame [6 x 14]
flights2.year flights2.month flights2.day flights2.hour flights2.origin flights2.dest
(int) (int) (int) (dbl) (chr) (chr)
1 2013 1 1 5 EWR IAH
2 2013 1 1 5 LGA IAH
3 2013 1 1 5 JFK MIA
4 2013 1 1 5 JFK BQN
5 2013 1 1 5 LGA ATL
6 2013 1 1 5 EWR ORD
Variables not shown: flights2.tailnum (chr), flights2.carrier (chr), airports2.name (chr),
airports2.lat (dbl), airports2.lon (dbl), airports2.alt (int), airports2.tz (dbl),
airports2.dst (chr)现在,可以很容易地以以下方式引用连接的数据框架:tableName.columnName
发布于 2019-01-19 00:19:03
类似的答案,但对后缀和一个流动:
编辑:对"funs()“的更改在dplyr0.8.0”/edit中是软推荐的
library(dplyr)
(band_members
%>% rename_all( list(~paste0(., ".left"))) # < dpylr 0.8.0: %>% rename_all( funs(paste0(., ".left")))
%>% left_join(band_instruments, by = c("name.left"="name"))
%>% rename_at( .vars = vars(-ends_with(".left")),list(~paste0(., ".right")))# < dpylr 0.8.0: %>% rename_at( .vars = vars(-ends_with(".left")),funs(paste0(., ".right"))
)或
(band_members
%>% rename_all( list(~paste0(., ".left"))) # < dpylr 0.8.0: %>% rename_all( funs(paste0(., ".left")))
%>% left_join(
band_instruments %>% rename_all( list(~paste0(., ".right"))), # < dpylr 0.8.0: band_instruments%>% rename_all( funs(paste0(., ".right")))
by = c("name.left"="name.right")
)
) 两者都给予:
# A tibble: 3 x 3
name.left band.left plays.right
<chr> <chr> <chr>
1 Mick Stones <NA>
2 John Beatles guitar
3 Paul Beatles bass 编辑:使用dplyr doc中的数据
library(tidyverse)
band_members <- tribble(
~ name, ~ band,
"Mick", "Stones",
"John", "Beatles",
"Paul", "Beatles"
)
band_instruments <- tribble(
~ name, ~ plays,
"John", "guitar",
"Paul", "bass",
"Keith", "guitar"
)发布于 2022-09-13 22:14:36
根据dplyr 1.0.10更新
如果连接表中只有几列,则可以通过select()函数直接管理这些列。
# oid_, in_fid, near_fid, near_dist, near_rank
dfArcNearBy %>%
left_join(
(dfArcBuilds %>% select(inFidZip5 = user_zip5, inFidUuId = user_uuid, oid_)),
by = c("in_fid" = "oid_")
) %>%
left_join(
(dfArcBuilds %>% select(nearFidZip5 = user_zip5, nearFidUuId = user_uuid, oid_)),
by = c("near_fid" = "oid_")
) %>%
{.}如果您想要基于模式的前缀或后缀,可以使用rename_with()和gsub()。
# oid_, in_fid, near_fid, near_dist, near_rank
dfArcNearBy %>%
left_join(
(dfArcBuilds %>% rename_with(.fn = ~ gsub("user_", "inFid_", .x, fixed = TRUE), .col = everything())),
by = c("in_fid" = "oid_")
) %>%
{.}如果您想使用前缀或后缀,可以使用rename_with()和paste0()。
# oid_, in_fid, near_fid, near_dist, near_rank
dfArcNearBy %>%
left_join(
(dfArcBuilds %>% rename_with(.fn = ~ paste0("{PREFIX}",.x,"{SUFFIX}"))),
by = c("in_fid" = "inFid_oid_")
) %>%
{.}https://stackoverflow.com/questions/40226561
复制相似问题