我正在使用R运行regex查询
df<- c("955 - 959 Fake Street","95-99 Fake Street","4-9 M4 Ln","95 - 99 Fake Street","99 Fake Street")
955 - 959 Fake Street
95-99 Fake Street
4-9 M4 Ln
95 - 99 Fake Street
99 Fake Street
我试图将这些地址排序为两列。
我期望:
strsplit(df, "\\d+(\\s*-\\s*\\d+)?", perl=T)
会把左边的数字和右边的地址的其余部分分开。
我得到的结果是:
[1] "" " Fake Street"
[1] "" " Fake Street"
[1] "" " M" " Ln"
[1] "" " Fake Street"
[1] "" " Fake Street"
字符串拆分函数似乎是删除用于拆分字符串的字段。有什么办法可以保存它吗?
谢谢
发布于 2017-05-05 05:01:16
您可以使用“后面”和“外观”在数字和字符之间的空格处拆分:
strsplit(df, "(?<=\\d)\\s(?=[[:alpha:]])", perl = TRUE)
# [[1]]
# [1] "955 - 959" "Fake Street"
#
# [[2]]
# [1] "95-99" "Fake Street"
#
# [[3]]
# [1] "4-9" "M4" "Ln"
#
# [[4]]
# [1] "95 - 99" "Fake Street"
#
# [[5]]
# [1] "99" "Fake Street"
然而,这也在"M4"
和"Ln"
之间的空间分裂。如果您的地址总是格式为"number (可能的范围),然后是其余的地址“,则可以分别提取这两个部分(如@d.b建议的):
splitDf <- data.frame(
numberPart = sub("(\\d+(\\s*-\\s*\\d+)?)(.*)", "\\1", df),
rest = trimws(sub("(\\d+(\\s*-\\s*\\d+)?)(.*)", "\\3", df)))
splitDf
# numberPart rest
# 1 955 - 959 Fake Street
# 2 95-99 Fake Street
# 3 4-9 M4 Ln
# 4 95 - 99 Fake Street
# 5 99 Fake Street
发布于 2017-05-05 05:05:51
您就快到了,只需将\\K\\s*
附加到正则表达式中,并在^
中加上字符串锚点的开头:
df<- c("955 - 959 Fake Street","95-99 Fake Street","4-9 M4 Ln","95 - 99 Fake Street","99 Fake Street")
strsplit(df, "^\\d+(\\s*-\\s*\\d+)?\\K\\s*", perl=T)
\K
是一个匹配重置操作符,它丢弃到目前为止的msatched文本,因此在匹配1+数字之后,可以选择在字符串的开头用-
加上0+空格和1+数字,然后删除整个文本。Ony 0+白空间将其放入匹配值,它们将被拆分。
参见R演示输出:
[[1]]
[1] "955 - 959" "Fake Street"
[[2]]
[1] "95-99" "Fake Street"
[[3]]
[1] "4-9" "M4 Ln"
[[4]]
[1] "95 - 99" "Fake Street"
[[5]]
[1] "99" "Fake Street"
https://stackoverflow.com/questions/43795753
复制相似问题