我正在尝试填写数据集中的新列。我有一个包含足球比赛信息的数据集。有一个名为“体育场”的栏目,里面有各种各样的体育场名称。我想增加一个新的栏目,其中包含体育场所在的国家。我的电视机看起来像这样
Match ID Stadium
1 Anfield
2 Camp Nou
3 Stadio Olimpico
4 Anfield
5 Emirates
我正在尝试创建一个新的列,如下所示:
Match ID Stadium Country
1 Anfield England
2 Camp Nou Spain
3 Stadio Olimpico Italy
4 Anfield England
5 Emirates England
只有少数几个体育场,但是有很多行,这意味着我试图找到一种方法来避免手动插入值。有小费吗?
发布于 2022-08-10 00:09:12
你想要从你的数据中得到唯一的球场名称,手动为每个球场创建一个与国家相对应的向量,然后使用体育场作为键加入它们。
library(dplyr)
# Example data
df <- data.frame(`Match ID` = 1:12,
Stadium = rep(c("Stadio Olympico", "Anfield",
"Emirates"), 4))
# Get the unique stadium names in a vector
unique_stadiums <- df %>% pull(Stadium) %>% unique()
unique_stadiums
#> [1] "Stadio Olympico" "Anfield" "Emirates"
# Manually create a vector of country names corresponding to each element of
# the unique stadum name vector. Ordering matters here!
countries <- c("Italy", "England", "England")
# Place them both into a data.frame
lookup <- data.frame(Stadium = unique_stadiums, Country = countries)
# Join the country names to the original data on the stadium key
left_join(x = df, y = lookup, by = "Stadium")
#> Match.ID Stadium Country
#> 1 1 Stadio Olympico Italy
#> 2 2 Anfield England
#> 3 3 Emirates England
#> 4 4 Stadio Olympico Italy
#> 5 5 Anfield England
#> 6 6 Emirates England
#> 7 7 Stadio Olympico Italy
#> 8 8 Anfield England
#> 9 9 Emirates England
#> 10 10 Stadio Olympico Italy
#> 11 11 Anfield England
#> 12 12 Emirates England
https://stackoverflow.com/questions/73292600
复制相似问题