前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >python的正则表达式

python的正则表达式

作者头像
IT架构圈
发布2018-06-01 12:23:46
4360
发布2018-06-01 12:23:46
举报
文章被收录于专栏:IT架构圈IT架构圈

比较多用于过滤条件,先确认想要过滤的信息,确认此信息和其他信息的不同。(找出特点) 然后对着正则表达式的列表转码就行。 基本其他的编程语言都有,linux一般常见于grep处理文本。

python的库几乎都不用记,想查可以import x, dir(x)来看

代码语言:javascript
复制
#for linux
$ grep '^From:' mbox-short.txt

记录一些python re常见的符号和用法,来自py4e

代码语言:javascript
复制
^ Matches the beginning of the line.
$ Matches the end of the line.
. Matches any character (a wildcard).
\s Matches a whitespace character.
\S Matches a non-whitespace character (opposite of \s).
* Applies to the immediately preceding character and indicates to match zero or more of the preceding character(s).
*? Applies to the immediately preceding character and indicates to match zero or more of the preceding character(s) in "non-greedy mode".
+ Applies to the immediately preceding character and indicates to match one or more of the preceding character(s).
+? Applies to the immediately preceding character and indicates to match one or more of the preceding character(s) in "non-greedy mode".
[aeiou] Matches a single character as long as that character is in the specified set. In this example, it would match "a", "e", "i", "o", or "u", but no other characters.
[a-z0-9] You can specify ranges of characters using the minus sign. This example is a single character that must be a lowercase letter or a digit.
[^A-Za-z] When the first character in the set notation is a caret, it inverts the logic. This example matches a single character that is anything other than an uppercase or lowercase letter.
( ) When parentheses are added to a regular expression, they are ignored for the purpose of matching, but allow you to extract a particular subset of the matched string rather than the whole string when using findall().
\b Matches the empty string, but only at the start or end of a word.
\B Matches the empty string, but not at the start or end of a word.
\d Matches any decimal digit; equivalent to the set [0-9].
\D Matches any non-digit character; equivalent to the set [^0-9].
greedy matching
The notion that the "+" and "*" characters in a regular expression expand outward to match the largest possible string.

用dir查询库里含有的模块

代码语言:javascript
复制
>>> import re
>>> dir(re)
[.. 'compile', 'copy_reg', 'error', 'escape', 'findall',
'finditer', 'match', 'purge', 'search', 'split', 'sre_compile',
'sre_parse', 'sub', 'subn', 'sys', 'template']
>>> help (re.search)
Help on function search in module re:

MATCH1.png

MATCH2.png

查找全部内容

re.findall

greedy matching 外扩到能找的最多为止。

greedy-matching.png

non-greedy matching ,找到最短契合的。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-04-03,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 编程坑太多 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档