http://docs.python.org/2/library/urlparse.html?highlight=urlparse#urlparse
主要函数如下:
1。urlparse
#!/usr/bin/python import urlparse webURL = "http://www.google.com/search?hl=en&q=python&btnG=Google+Search" #parseTuple = urlparse.urlsplit(webURL) parseTuple = urlparse.urlparse(webURL) print parseTuple
输出如下:
ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='hl=en&q=python&btnG=Google+Search', fragment='')
我们可以看到输入为6个部分;元组 (scheme, netloc, path, parameters, query, fragment)
2. urlparse.urlunparse(parts)
#!/usr/bin/python import urlparse URLschema = "ftp" webURL = "http://www.google.com/search?hl=en&q=python&btnG=Google+Search" #parseTuple = urlparse.urlsplit(webURL) parseTuple = urlparse.urlparse(webURL) print parseTuple u = urlparse.urlunparse((URLschema,parseTuple.netloc,parseTuple.path,parseTuple.params,parseTuple.query,'')) print u
结果如下:
重新拼合成了一个新的url
ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='hl=en&q=python&btnG=Google+Search', fragment='') ftp://www.google.com/search?hl=en&q=python&btnG=Google+Search
3.
urlparse.urlsplit(urlstring[, scheme[, allow_fragments]]) This function returns a 5-tuple: (addressing scheme, network location, path, query, fragment identifier).
SplitResult(scheme='http', netloc='www.google.com', path='/search', query='hl=en&q=python&btnG=Google+Search', fragment='')
4.urlparse.urljoin(base, url[, allow_fragments])
这个的主要作用是拼接url
import urlparse #-*- coding:utf-8 -*- #测试1 base_url = "http://motor.blog.51cto.com/blog/addblog.php" relative_url = "../blog/test.php" abs_url = urlparse.urljoin(base_url, relative_url) print abs_url #测试2 base_url_2 = "http://motor.blog.51cto.com/blog/addblog.php" relative_url_2 = "test.php" abs_url_2 = urlparse.urljoin(base_url_2, relative_url_2) print abs_url_2 #测试3 base_url_3 = "http://motor.blog.51cto.com/blog/" relative_url_3 = "test.php" abs_url_3 = urlparse.urljoin(base_url_3, relative_url_3) print abs_url_3 #测试4 base_url_4 = "http://motor.blog.51cto.com/blog" relative_url_4 = "test.php" abs_url_4 = urlparse.urljoin(base_url_4, relative_url_4) print abs_url_4
结果如下:
http://motor.blog.51cto.com/blog/test.php http://motor.blog.51cto.com/blog/test.php http://motor.blog.51cto.com/blog/test.php http://motor.blog.51cto.com/test.php