# 专栏：003：正则表达式

01

02

2/8法则，解释使用最频繁的语法

03

04

## 2：概念

• 什么是正则表达式？

• 眼见为实

`\bhi\b.*\bLucy\b`这是个正则表达式

## 3：语法

literal

wuxiaoshen

.

wu.iaoshen

^

^wuxiaoshen

\$

wuxiaoshen\$

*

wu*xiaoshen

+

wu+xiaoshen

?

wu?xiaoshen

{N}

[0-9]{2}

{M,N}

[0-9]{3,8}

[ ]

wu[xyz]iaoshen

[x-y]

[0-9]

[^..]

[^0-9]

()

（wuxiaoshen）

\d

data\d.txt

\w

[wuxiao]\w+

\s

of\sthe

\b

\bwuxiaoshen\b

\D

\W

\S

\B

• 眼见为实
```pattern = "http://mindhacks.cn/"

mind_pattern_1 = "mind"
mind_pattern_2 = "[m].*?d"
mind_pattern_3 = r"//(.*?)h"
mind_pattern_4 = r"[mind]{4}"

• 实例:

```QQnumber_pattern = '[1-9][0-9]{4,}'

tellnumber_pattern = '0?(13[0-9]|15[012356789]|17[0678]|18[0-9]|14[57])[0-9]{8}'

IPnumber_pattern = '((?:(?:25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d))).){3}(?:25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d))))'

IPnumber_pattern_2 = '\d+\.\d+\.\d+\.\d+'

## 4：代码实例

match(pattern, string, flag)

search(pattern, string, flag)

findall(pattern, string, flag)

split(pattern, string, flag)

sub(pattern, repl, string, flag)

```# 假设你对下面这个博客首页的文章的标题感兴趣.
import re
import requests
# 先缩小范围，再在缩小的范围内进行匹配

html = requests.get(url_one)
response = html.text
#<h3 id="post-967" class="post-title"><a href="http://www.geekonomics10000.com/967" rel="bookmark" title="Permanent Link to 特朗普是极右狂人？其实共和党候选人里，他最温和">特朗普是极右狂人？其实共和党候选人里，他最温和</a></h3>

content = r'h3\sid(.*?)</h3>'

#title="Permanent Link to 特朗普是极右狂人？其实共和党候选人里，他最温和">特朗普是极右狂人？其实共和党候选人里，他最温和</a></h3>

little_title = r'title=.*?>(.*?)</a>'
all_title = re.findall(content, response, re.S)
title_content = re.findall(little_title, str(all_title), re.S)
for one in title_content:
print(one)
# output
---

2016新年荐书

---

```# 假设你想匹配首页的课程图片
# -*- coding:utf-8 -*-
# To: regular expression
# Author: wuxiaoshen

import re
import requests
class TestRe(object):
"""
使用正则表达式抓取imooc课首页网站的图片：并下载至002 JPG文件夹下
"""
def __init__(self):

pass

url = "http://www.imooc.com/course/list"
html = requests.get(url)
response = html.text
listurl = re.findall(r'http://.+.jpg',response)
print(listurl)
i = 0
for one in listurl:
with open("002 JPG\\"+str(i)+".jpg","wb") as f:
cont = requests.get(one)
print(cont)
f.write(cont.content)
i += 1
f.close()
pass

if __name__=="__main__":
pass```

## 4：参考及备注

1. Notepad++ 文本编辑器 的查找可以使用正则匹配

1461750504276.png

1. 还有在线的正则表达式测试工具
2. chrome 还是正则匹配的插件Regular Expression Checker

119 篇文章44 人订阅

0 条评论

19930

615120

31640

14850

19030

22540

11720

14430

20760

37320