一、以下为stringr包的字符串处理函数:
1. 字符串的大小写转换
str_to_upper(string, locale = “”) str_to_lower(string, locale = “”) str_to_title(string, locale = “”) 2. invert_match 返回非匹配模式的起始位置
3. modifiers 指定模式的类别
fixed(pattern, ignore_case = FALSE):Compare literal bytes in the string. This is very fast, but not usually what you want for non-ASCII character sets. coll(pattern, ignore_case = FALSE, locale = NULL, …):Compare strings respecting standard collation rules. regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, …):默认使用正则表达式 boundary(type = c(“character”, “line_break”, “sentence”, “word”), skip_word_none = TRUE, …):Match boundaries between things. pattern: Pattern to modify behaviour. ignore_case: Should case differences be ignored in the match? locale: Locale to use for comparisons. See stri_locale_list() for all possible options. …: Other less frequently used arguments passed onto stri_opts_collator, stri_opts_regex, or stri_opts_brkiter multiline: If TRUE, $ and ^ match the beginning and end of each line. If FALSE, the default, only match the start and end of the input. comments: If TRUE, whitespace and comments beginning with # are ignored. Escape literal spaces with \ . dotall: If TRUE, . will also match line terminators. type: Boundary type to detect. skip_word_none: Ignore “words” that don’t contain any characters or numbers - i.e. punctuation. 4. str_c 连接字符串
str_c(…, sep = “”, collapse = NULL) str_join(…, sep = “”, collapse = NULL) 5. str_conv 指定字符串的编码
str_conv(string, encoding) 6. str_count 计算字符串中的匹配模式的数目
str_count(string, pattern = “”) 7. str_detect 检测字符串中是否存在某种模式
str_detect(string, pattern)
8. str_dup 重复和连接字符串向量
9. str_extract 从字符串中提取匹配的模式
str_extract(string, pattern) 提取匹配的第一个字符串 str_extract_all(string, pattern, simplify = FALSE) 提取匹配的所有字符串 10. str_length 字符串的长度
11. str_locate 定位在字符串中匹配模式的位置
str_locate(string, pattern):返回匹配的第一个字符串的位置 str_locate_all(string, pattern):返回匹配的所有位置 12. str_match 从字符串中提取匹配组
str_match(string, pattern) 提取匹配的第一个字符串 str_match_all(string, pattern) 提取匹配的所有字符串 13. str_order 对字符向量进行排序
str_order(x, decreasing = FALSE, na_last = TRUE, locale = “”, …) str_sort(x, decreasing = FALSE, na_last = TRUE, locale = “”, …) 14. str_pad 在字符串的前后位置填充字符(如空格)
-str_pad(string, width, side = c(“left”, “right”, “both”), pad = “ “)
width:填充字符后字符串的长度; side:填充字符串的位置,默认为left; pad:指定填充的字符串; 15. str_replace 替换字符串中的匹配模式
str_replace(string, pattern, replacement) str_replace_all(string, pattern, replacement) 16. str_replace_na 将缺失值替换成‘NA’
str_replace_na(string, replacement = “NA”) 17. str_split 根据一个分隔符将字符串进行分割
str_split(string, pattern, n = Inf)#结果返回列表 str_split_fixed(string, pattern, n)#结果返回矩阵 18. str_sub 按位置从字符向量中提取或替换子字符串
str_sub(string, start = 1L, end = -1L) 提取子字符串 str_sub(string, start = 1L, end = -1L) <- value 替换子字符串 19. str_subset 提取匹配模式的字符串向量元素
str_subset(string, pattern) 20. str_trim 删除字符串中的空格
str_trim(string, side = c(“both”, “left”, “right”)) 21. str_wrap
str_wrap(string, width = 80, indent = 0, exdent = 0)
width:每行的宽度 indent:设置首行缩进 exdent:设置第二行后每行缩进 22. word 从句子中提取单词
word(string, start = 1L, end = start, sep = fixed(“ “)) 二、以下为基础包的字符串处理函数:
23. paste() 字符串连接:
paste(…, sep = “ “, collapse = NULL)
24. strsplit() 字符串分割:
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
split:设置分割符 fixed:逻辑值,默认值为FALSE perl:逻辑值,默认值为FALSE,取TRUE时,分割符使用正则表达式 useBytes:逻辑值,默认值为FALSE, 25. nchar() 计算字符串的字符个数:
nchar(x, type = “chars”, allowNA = FALSE) 26. substr 字符串截取及替换:
(1)substr(x, start, stop)
(2)substring(text, first, last = 1000000L)
(3)substr(x, start, stop) <- value
(4)substring(text, first, last = 1000000L) <- value
27. 字符串替换及大小写转换:
chartr(old, new, x)
tolower(x)
toupper(x)
casefold(x, upper = FALSE)
28. 字符匹配与替换
(1) grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE),结果返回匹配的向量x的元素的索引
ignore.case:逻辑值,默认值FALSE,区分大小写; perl:逻辑值,默认值FALSE,不使用正则表达式; value:逻辑值,设置结果返回匹配元素的值还是索引,默认值为FALSE:返回索引; fixed:逻辑值,默认值为FALSE,取值为TRUE时使用精确匹配; useBytes:逻辑值,默认取值FALSE; invert:逻辑值,默认取值FALSE,设置结果返回匹配还是非匹配的元素; (2) grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE),结果返回一个与向量x等长的逻辑向量,匹配的元素返回TRUE,不匹配的返回FALSE。
(3) sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE),替换匹配的元素的第一个字符串
(4) gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE),替换匹配的元素的所有字符串
(5) regexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE),结果返回每个元素匹配的第一个位置及字符数目,不匹配的元素返回的位置和长度都是-1。
(6) gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE),返回每个元素匹配的所有位置及相应的字符数目
(7) regexec(pattern, text, ignore.case = FALSE, fixed = FALSE, useBytes = FALSE)