问(Python)尝试从网站中隔离一些数据
EN

Stack Overflow用户

提问于 2014-02-03 08:48:00

回答 2查看 107关注 0票数 0

基本上，该脚本将从wallbase.cc的随机页面和toplist页面下载图像。本质上，它查找一个7位数的字符串，该字符串将每个图像标识为该图像。它将该id输入到url中并下载它。我唯一的问题似乎是隔离7位数的字符串。

我想要做的是..

搜索<div id="thumbxxxxxxx"，然后将xxxxxxx赋值给变量。

这是我到目前为止所掌握的。

import urllib
import os
import sys
import re


#Written in Python 2.7 with LightTable


def get_id():
    import urllib.request
    req = urllib.request.Request('http://wallbase.cc/'+initial_prompt)
    response = urllib.request.urlopen(req)
    the_page = response.read()
    for "data-id="" in the_page


def toplist():
    #We need to define how to find the images to download
    #The idea is to go to http://wallbase.cc/x and to take all of strings containing <a href="http://wallbase.cc/wallpaper/xxxxxxx" </a>
    #And to request the image file from that URL.
    #Then the file will be put in a user defined directory

    image_id = raw_input("Enter the seven digit identifier for the image to be downloaded to "+ directory+ "...\n>>> ")

    f = open(directory+image_id+ '.jpg','wb')
    f.write(urllib.urlopen('http://wallpapers.wallbase.cc/rozne/wallpaper-'+image_id+'.jpg').read())
    f.close()


directory = raw_input("Enter the directory in which the images will be downloaded.\n>>> ")

initial_prompt = input("What do you want to download from?\n\t1: Toplist\n\t2: Random\n>>> ")
if initial_prompt == 1:
    urlid = 'toplist'
    toplist()

elif initial_prompt == 2:
    urlid = 'random'
    random()

非常感谢任何/所有的帮助:)

python

urllib

Stack Overflow用户

发布于 2014-02-03 08:58:26

如果只想使用默认库，可以使用正则表达式。

pattern = re.compile(r'<div id="thumb(.{7})"')

...

for data-id in re.findall(pattern, the_page):
    pass # do something with data-id

票数 0

查看全部 2 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/21518249

复制

相似问题

问(Python)尝试从网站中隔离一些数据
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问(Python)尝试从网站中隔离一些数据EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问(Python)尝试从网站中隔离一些数据
EN