'''
python中,html模块提供了只提供了一个方法:
html.escape(s, quote = True)
该方法主要是把html文件中的特殊字符(&,<,>,",'等)转换为HTML-safe字符
'''
下面是我做的一个demo:
运行效果:
Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
源html文件:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title> Python Html module </title>
<meta name="Generator" content="EditPlus">
<meta name="Author" content="Hongten">
<meta name="Keywords" content="hongten,python">
<meta name="Description" content="this blogs is about python">
</head>
<body>
<table border = "1">
<tr>
<td>
Author
</td>
<td>
Hongten
</td>
<td>
Mail
</td>
<td>
hongtenzone@foxmail.com
</td>
</tr>
<tr>
<td>
Blos
</td>
<td>
<a href="http://www.blogs.com/hongten">http://www.blogs.com/hongten</a>
</td>
<td>
QQ
</td>
<td>
648719819
</td>
</tr>
</table>
</body>
</html>
##################################################
转换html文件:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title> Python Html module </title>
<meta name="Generator" content="EditPlus">
<meta name="Author" content="Hongten">
<meta name="Keywords" content="hongten,python">
<meta name="Description" content="this blogs is about python">
</head>
<body>
<table border = "1">
<tr>
<td>
Author
</td>
<td>
Hongten
</td>
<td>
Mail
</td>
<td>
hongtenzone@foxmail.com
</td>
</tr>
<tr>
<td>
Blos
</td>
<td>
<a href="http://www.blogs.com/hongten">http://www.blogs.com/hongten</a>
</td>
<td>
QQ
</td>
<td>
648719819
</td>
</tr>
</table>
</body>
</html>
>>>
经过源文件内容和转换后的内容相比较,我想你知道html.escape()方法的作用了吧
代码部分:
1 #python html
2
3 #Author : Hongten
4 #Mailto : hongtenzone@foxmail.com
5 #Blog : http://www.cnblogs.com/hongten
6 #QQ : 648719819
7 #Create : 2013-08-26
8 #Version : 1.0
9
10 import html
11
12 '''
13 python中,html模块提供了只提供了一个方法:
14 html.escape(s, quote = True)
15 该方法主要是把html文件中的特殊字符(&,<,>,",'等)转换为HTML-safe字符
16 '''
17
18 #global var
19 #html源文件内容
20 HTML_STR = ''
21
22 def html_escape(html_str):
23 '''转换特殊字符'''
24 return html.escape(html_str)
25
26 def init():
27 global HTML_STR
28 HTML_STR = '''
29 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
30 <html>
31 <head>
32 <title> Python Html module </title>
33 <meta name="Generator" content="EditPlus">
34 <meta name="Author" content="Hongten">
35 <meta name="Keywords" content="hongten,python">
36 <meta name="Description" content="this blogs is about python">
37 </head>
38
39 <body>
40 <table border = "1">
41 <tr>
42 <td>
43 Author
44 </td>
45 <td>
46 Hongten
47 </td>
48 <td>
49 Mail
50 </td>
51 <td>
52 hongtenzone@foxmail.com
53 </td>
54 </tr>
55 <tr>
56 <td>
57 Blos
58 </td>
59 <td>
60 <a href="http://www.blogs.com/hongten">http://www.blogs.com/hongten</a>
61 </td>
62 <td>
63 QQ
64 </td>
65 <td>
66 648719819
67 </td>
68 </tr>
69 </table>
70 </body>
71 </html>
72 '''
73
74 def main():
75 init()
76 print('源html文件:{}'.format(HTML_STR))
77 print('#' * 50)
78 old_str = html_escape(HTML_STR)
79 print('转换html文件:{}'.format(old_str))
80
81 if __name__ == '__main__':
82 main()