问如何编写下载的python脚本？
EN

Stack Overflow用户

提问于 2012-09-26 06:21:19

回答 3查看 19.4K关注 0票数 6

我想从这个网站下载一些文件：http://www.emuparadise.me/soundtracks/highquality/index.php

但我只想买一些特定的。

有没有办法写一个python脚本来做这件事？我对python有一定的了解。

我只是在寻找一点指导，请给我指出一个维基或图书馆来实现这一点

谢谢，灌木

Here's a link to my code

python

download

python-2.7

回答 3

Stack Overflow用户

回答已采纳

发布于 2012-09-26 06:34:35

我看了那一页。这些链接似乎重定向到文件所在的另一个页面，单击可下载该文件的页面。

我会使用mechanize将所需的链接指向正确的页面，然后使用BeautifulSoup或lxml解析生成的页面以获得文件名。

然后，使用urlopen打开文件并将其内容写出到本地文件中，如下所示：

f = open(localFilePath, 'w')
f.write(urlopen(remoteFilePath).read())
f.close()

希望这能有所帮助

票数 4

Stack Overflow用户

发布于 2013-08-01 00:27:11

为页面发出url请求。一旦你有了源码，就过滤掉并获得urls。

您要下载的文件是包含特定扩展名的urls。有了它，您就可以对符合条件的所有urls执行正则表达式搜索。过滤后，对每个匹配的url数据执行url请求，并将其写入内存。

示例代码：

#!/usr/bin/python
import re
import sys
import urllib

#Your sample url
sampleUrl = "http://stackoverflow.com"
urlAddInfo = urllib.urlopen(sampleUrl)
data = urlAddInfo.read()

#Sample extensions we'll be looking for: pngs and pdfs
TARGET_EXTENSIONS = "(png|pdf)"
targetCompile = re.compile(TARGET_EXTENSIONS, re.UNICODE|re.MULTILINE)

#Let's get all the urls: match criteria{no spaces or " in a url}
urls = re.findall('(https?://[^\s"]+)', data, re.UNICODE|re.MULTILINE)

#We want these folks
extensionMatches = filter(lambda url: url and targetCompile.search(url), urls)

#The rest of the unmatched urls for which the scrapping can also be repeated.
nonExtMatches = filter(lambda url: url and not targetCompile.search(url), urls)


def fileDl(targetUrl):
  #Function to handle downloading of files.
  #Arg: url => a String
  #Output: Boolean to signify if file has been written to memory

  #Validation of the url assumed, for the sake of keeping the illustration short
  urlAddInfo = urllib.urlopen(targetUrl)
  data = urlAddInfo.read()
  fileNameSearch = re.search("([^\/\s]+)$", targetUrl) #Text right before the last slash '/'
  if not fileNameSearch:
     sys.stderr.write("Could not extract a filename from url '%s'\n"%(targetUrl))
     return False
  fileName = fileNameSearch.groups(1)[0]
  with open(fileName, "wb") as f:
    f.write(data)
    sys.stderr.write("Wrote %s to memory\n"%(fileName))
  return True

#Let's now download the matched files
dlResults = map(lambda fUrl: fileDl(fUrl), extensionMatches)
successfulDls = filter(lambda s: s, dlResults)
sys.stderr.write("Downloaded %d files from %s\n"%(len(successfulDls), sampleUrl))

#You can organize the above code into a function to repeat the process for each of the
#other urls and in that way you can make a crawler.

上面的代码主要是为Python2.X编写的。

票数 1

Stack Overflow用户

发布于 2012-09-26 06:40:09

我将使用wget的组合来下载- http://www.thegeekstuff.com/2009/09/the-ultimate-wget-download-guide-with-15-awesome-examples/#more-1885和BeautifulSoup http://www.crummy.com/software/BeautifulSoup/bs4/doc/来解析下载的文件

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/12591959

复制

相似问题

问如何编写下载的python脚本？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何编写下载的python脚本？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何编写下载的python脚本？
EN