开发者社区

文档建议反馈控制台

最新优惠活动

文章/答案/技术大牛

发布

社区首页 >专栏 >R语言字符串函数详解

R语言字符串函数详解

作者头像

CDA数据分析师

发布于 2018-02-08 14:36:33

2.6K0

发布于 2018-02-08 14:36:33

举报

文章被收录于专栏：CDA数据分析师

一、以下为stringr包的字符串处理函数：

1. 字符串的大小写转换

str_to_upper(string， locale = “”)
str_to_lower(string， locale = “”)
str_to_title(string， locale = “”)

2. invert_match 返回非匹配模式的起始位置

3. modifiers 指定模式的类别

fixed(pattern， ignore_case = FALSE)：Compare literal bytes in the string. This is very fast， but not usually what you want for non-ASCII character sets.
coll(pattern， ignore_case = FALSE， locale = NULL， …)：Compare strings respecting standard collation rules.
regex(pattern， ignore_case = FALSE， multiline = FALSE， comments = FALSE， dotall = FALSE， …)：默认使用正则表达式
boundary(type = c(“character”， “line_break”， “sentence”， “word”)， skip_word_none = TRUE， …)：Match boundaries between things.
pattern： Pattern to modify behaviour.
ignore_case： Should case differences be ignored in the match？
locale： Locale to use for comparisons. See stri_locale_list() for all possible options.
…： Other less frequently used arguments passed onto stri_opts_collator， stri_opts_regex， or stri_opts_brkiter
multiline： If TRUE， $ and ^ match the beginning and end of each line. If FALSE， the default， only match the start and end of the input.
comments： If TRUE， whitespace and comments beginning with # are ignored. Escape literal spaces with \ .
dotall： If TRUE， . will also match line terminators.
type： Boundary type to detect.
skip_word_none： Ignore “words” that don’t contain any characters or numbers - i.e. punctuation.

4. str_c 连接字符串

str_c(…， sep = “”， collapse = NULL)
str_join(…， sep = “”， collapse = NULL)

5. str_conv 指定字符串的编码

str_conv(string， encoding)

6. str_count 计算字符串中的匹配模式的数目

str_count(string， pattern = “”)

7. str_detect 检测字符串中是否存在某种模式

str_detect(string， pattern)

8. str_dup 重复和连接字符串向量

str_dup(string， times)

9. str_extract 从字符串中提取匹配的模式

str_extract(string， pattern) 提取匹配的第一个字符串
str_extract_all(string， pattern， simplify = FALSE) 提取匹配的所有字符串

10. str_length 字符串的长度

11. str_locate 定位在字符串中匹配模式的位置

str_locate(string， pattern)：返回匹配的第一个字符串的位置
str_locate_all(string， pattern)：返回匹配的所有位置

12. str_match 从字符串中提取匹配组

str_match(string， pattern) 提取匹配的第一个字符串
str_match_all(string， pattern) 提取匹配的所有字符串

13. str_order 对字符向量进行排序

str_order(x， decreasing = FALSE， na_last = TRUE， locale = “”， …)
str_sort(x， decreasing = FALSE， na_last = TRUE， locale = “”， …)

14. str_pad 在字符串的前后位置填充字符（如空格）

-str_pad(string， width， side = c(“left”， “right”， “both”)， pad = “ “)

width：填充字符后字符串的长度；
side：填充字符串的位置，默认为left；
pad：指定填充的字符串；

15. str_replace 替换字符串中的匹配模式

str_replace(string， pattern， replacement)
str_replace_all(string， pattern， replacement)

16. str_replace_na 将缺失值替换成‘NA’

str_replace_na(string， replacement = “NA”)

17. str_split 根据一个分隔符将字符串进行分割

str_split(string， pattern， n = Inf)#结果返回列表
str_split_fixed(string， pattern， n)#结果返回矩阵

18. str_sub 按位置从字符向量中提取或替换子字符串

str_sub(string， start = 1L， end = -1L) 提取子字符串
str_sub(string， start = 1L， end = -1L) <- value 替换子字符串

19. str_subset 提取匹配模式的字符串向量元素

str_subset(string， pattern)

20. str_trim 删除字符串中的空格

str_trim(string， side = c(“both”， “left”， “right”))

21. str_wrap

str_wrap(string， width = 80， indent = 0， exdent = 0)

width：每行的宽度
indent：设置首行缩进
exdent：设置第二行后每行缩进

22. word 从句子中提取单词

word(string， start = 1L， end = start， sep = fixed(“ “))

二、以下为基础包的字符串处理函数：

23. paste() 字符串连接：

paste(…， sep = “ “， collapse = NULL)

24. strsplit() 字符串分割：

strsplit(x， split， fixed = FALSE， perl = FALSE， useBytes = FALSE)

split：设置分割符
fixed：逻辑值，默认值为FALSE
perl：逻辑值，默认值为FALSE，取TRUE时，分割符使用正则表达式
useBytes：逻辑值，默认值为FALSE，

25. nchar() 计算字符串的字符个数：

nchar(x， type = “chars”， allowNA = FALSE)

26. substr 字符串截取及替换：

(1)substr(x， start， stop)

(2)substring(text， first， last = 1000000L)

(3)substr(x， start， stop) <- value

(4)substring(text， first， last = 1000000L) <- value

27. 字符串替换及大小写转换：

chartr(old， new， x)

tolower(x)

toupper(x)

casefold(x， upper = FALSE)

28. 字符匹配与替换

(1) grep(pattern， x， ignore.case = FALSE， perl = FALSE， value = FALSE， fixed = FALSE， useBytes = FALSE， invert = FALSE)，结果返回匹配的向量x的元素的索引

ignore.case：逻辑值，默认值FALSE，区分大小写；
perl：逻辑值，默认值FALSE，不使用正则表达式；
value：逻辑值，设置结果返回匹配元素的值还是索引，默认值为FALSE：返回索引；
fixed：逻辑值，默认值为FALSE，取值为TRUE时使用精确匹配；
useBytes：逻辑值，默认取值FALSE；
invert：逻辑值，默认取值FALSE，设置结果返回匹配还是非匹配的元素；

(2) grepl(pattern， x， ignore.case = FALSE， perl = FALSE， fixed = FALSE， useBytes = FALSE)，结果返回一个与向量x等长的逻辑向量，匹配的元素返回TRUE，不匹配的返回FALSE。

(3) sub(pattern， replacement， x， ignore.case = FALSE， perl = FALSE， fixed = FALSE， useBytes = FALSE)，替换匹配的元素的第一个字符串

(4) gsub(pattern， replacement， x， ignore.case = FALSE， perl = FALSE， fixed = FALSE， useBytes = FALSE)，替换匹配的元素的所有字符串

(5) regexpr(pattern， text， ignore.case = FALSE， perl = FALSE， fixed = FALSE， useBytes = FALSE)，结果返回每个元素匹配的第一个位置及字符数目，不匹配的元素返回的位置和长度都是-1。

(6) gregexpr(pattern， text， ignore.case = FALSE， perl = FALSE， fixed = FALSE， useBytes = FALSE)，返回每个元素匹配的所有位置及相应的字符数目

(7) regexec(pattern， text， ignore.case = FALSE， fixed = FALSE， useBytes = FALSE)

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2016-06-22，如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CDA数据分析师微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

评论

登录后参与评论

0 条评论

热度

最新

LV.