我有一个多年的学校地址df,但由于数据输入不一致,一些学校的地址在不同年份之间的书写方式不同。
我正在尝试通过使用str_replace来修改它们,比如"St.“。转换为"Street",但有时我需要指定模式是字符向量的末尾。例如,“西街”应该变成“西街”,但是我把“街”换成了“街”,那么写得正确的就变成了“西街”。
test <- tribble(
~ name, ~ year, ~address,
"school 1", 2000, "1 Main Ave",
"school 1", 2001, "1 Main Avenue",
"school 1", 2002, "1 Main Ave",
"school 1", 2004, "1 Main Avenue",
"school 2", 2000, "200 West St",
"school 2", 2001, "200 West Street",
"school 2", 2002, "200 West St",
"school 2", 2004, "200 West St",
"school 3", 2000, "2759 Lakeshore Road",
"school 3", 2001, "2759 Lakeshore Road",
"school 3", 2002, "2759 Lakeshore Rd",
"school 3", 2004, "2759 Lakeshore Rd"
)
test %>%
mutate(address2 = str_replace(address, "Rd","Road"),
address2 = str_replace(address2, "Ave","Avenue"),
address2 = str_replace(address2, "St","Street"))这将返回:
# A tibble: 12 × 4
name year address address2
<chr> <dbl> <chr> <chr>
1 school 1 2000 1 Main Ave 1 Main Avenue
2 school 1 2001 1 Main Avenue 1 Main Avenuenue
3 school 1 2002 1 Main Ave 1 Main Avenue
4 school 1 2004 1 Main Avenue 1 Main Avenuenue
5 school 2 2000 200 West St 200 West Street
6 school 2 2001 200 West Street 200 West Streetreet
7 school 2 2002 200 West St 200 West Street
8 school 2 2004 200 West St 200 West Street
9 school 3 2000 2759 Lakeshore Road 2759 Lakeshore Road
10 school 3 2001 2759 Lakeshore Road 2759 Lakeshore Road
11 school 3 2002 2759 Lakeshore Rd 2759 Lakeshore Road
12 school 3 2004 2759 Lakeshore Rd 2759 Lakeshore Road这显然是不正确的。我如何指定只有当一个模式以"St“结尾时,才应该改变它?
发布于 2021-10-20 10:27:58
你可以使用\\b来表示单词边界,这样St就可以出现在字符串中的任何地方,只有当它本身是一个完整的单词时,它才会被替换。
library(dplyr)
library(stringr)
test %>%
mutate(address2 = str_replace(address, "\\bRd\\b","Road"),
address2 = str_replace(address2, "\\bAve\\b","Avenue"),
address2 = str_replace(address2, "\\bSt\\b","Street"))但是,如果您创建一个带有模式和替换的命名向量来查找,这是一个使用str_replace_all的一行程序-
pat <- setNames(c("Road", "Avenue", "Street"),
c("\\bRd\\b", "\\bAve\\b", "\\bSt\\b"))
test %>% mutate(address2 = str_replace_all(address, pat))
# name year address address2
# <chr> <dbl> <chr> <chr>
# 1 school 1 2000 1 Main Ave 1 Main Avenue
# 2 school 1 2001 1 Main Avenue 1 Main Avenue
# 3 school 1 2002 1 Main Ave 1 Main Avenue
# 4 school 1 2004 1 Main Avenue 1 Main Avenue
# 5 school 2 2000 200 West St 200 West Street
# 6 school 2 2001 200 West Street 200 West Street
# 7 school 2 2002 200 West St 200 West Street
# 8 school 2 2004 200 West St 200 West Street
# 9 school 3 2000 2759 Lakeshore Road 2759 Lakeshore Road
#10 school 3 2001 2759 Lakeshore Road 2759 Lakeshore Road
#11 school 3 2002 2759 Lakeshore Rd 2759 Lakeshore Road
#12 school 3 2004 2759 Lakeshore Rd 2759 Lakeshore Roadhttps://stackoverflow.com/questions/69643871
复制相似问题