首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >如何从一个下拉菜单中提取数据使用python漂亮的汤

如何从一个下拉菜单中提取数据使用python漂亮的汤
EN

Stack Overflow用户
提问于 2019-05-27 14:16:25
回答 1查看 4.1K关注 0票数 4

我试图从一个网站上抓取数据,这个网站有一个多级下拉菜单,每次选择一个项目时,它都会改变子下拉菜单的子项目。问题是,对于每个循环,它从下拉项中提取相同的子项。选择发生了,但它没有更新项目代表新的选择从循环谁能帮助我为什么我没有得到想要的结果。也许这是因为我的下拉列表是java Script之类的。

例如,下图中的manue:

我已经走到这一步了:

代码语言:javascript
复制
enter code here

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
import csv

//#from selenium.webdriver.support import Select 
import time

print ("opening chorome....")  
driver = webdriver.Chrome()
driver.get('https://www.wheelmax.com/')
time.sleep(10)

csvData = ['Year', 'Make', 'Model', 'Body', 'Submodel', 'Size']

//#variables
yeart = []
make= []
model=[]
body = []
submodel = []
size = []
Yindex = Mkindex = Mdindex = Bdindex = Smindex = Sindex = 0

print ("waiting for program to set variables....")
time.sleep(20)

print ("initializing and setting variables....")

//#initializing Year
Year = Select(driver.find_element_by_id("icm-years-select"))
Year.select_by_value('2020')
yr = driver.find_elements(By.XPATH, '//*[@id="icm-years-select"]')
time.sleep(15)

//#initializing Make
Make = Select(driver.find_element_by_id("icm-makes-select"))
Make.select_by_index(1)
mk = driver.find_elements(By.XPATH, '//*[@id="icm-makes-select"]')
time.sleep(15)

//#initializing Model
Model = Select(driver.find_element_by_id("icm-models-select"))
Model.select_by_index(1)
mdl = driver.find_elements(By.XPATH, '//*[@id="icm-models-select"]')
time.sleep(15)

//#initializing body
Body = Select(driver.find_element_by_id("icm-drivebodies-select"))
Body.select_by_index(1)
bdy = driver.find_elements(By.XPATH, '//*[@id="icm-drivebodies-select"]')
time.sleep(15)

//#initializing submodel
Submodel = Select(driver.find_element_by_id("icm-submodels-select"))
Submodel.select_by_index(1)
sbm = driver.find_elements(By.XPATH, '//*[@id="icm-submodels-select"]')
time.sleep(15)

//#initializing size
Size = Select(driver.find_element_by_id("icm-sizes-select"))
Size.select_by_index(0)
siz = driver.find_elements(By.XPATH, '//*[@id="icm-sizes-select"]')
time.sleep(5)


Cyr = Cmk = Cmd = Cbd = Csmd = Csz = ""

print ("fetching data from variables....")

for y in yr:
    obj1 = driver.find_element_by_id("icm-years-select")
    Year = Select(obj1)
    Year.select_by_index(++Yindex)
    obj1.click()
    #obj1.click()
    yeart.append(y.text)
    Cyr = y.text
    time.sleep(10)
    for m in mk:
        obj2 = driver.find_element_by_id("icm-makes-select")
        Make = Select(obj2)
        Make.select_by_index(++Mkindex)
        obj2.click()
        #obj2.click()
        make.append(m.text)
        Cmk = m.text
        time.sleep(10)
        for md in mdl:
            Mdindex =0
            obj3 = driver.find_element_by_id("icm-models-select")
            Model = Select(obj3)
            Model.select_by_index(++Mdindex)
            obj3.click()
            #obj3.click(clickobj)
            model.append(md.text)
            Cmd = md.text
            time.sleep(10)
            Bdindex = 0
            for bd in bdy:
                obj4 = driver.find_element_by_id("icm-drivebodies-select")
                Body = Select(obj4)
                Body.select_by_index(++Bdindex)
                obj4.click()
                #obj4.click(clickobj2)
                body.append(bd.text)
                Cbd = bd.text
                time.sleep(10)
                Smindex = 0
                for sm in sbm:
                    obj5 = driver.find_element_by_id("icm-submodels-select")
                    Submodel = Select(obj5)
                    obj5.click()
                    Submodel.select_by_index(++Smindex)
                    #obj5.click(clickobj5)
                    submodel.append(sm.text)
                    Csmd = sm.text
                    time.sleep(10)
                    Sindex = 0
                    for sz in siz:
                        Size = Select(driver.find_element_by_id("icm-sizes-select"))
                        Size.select_by_index(++Sindex)
                        size.append(sz.text)
                        Scz = sz.text
                        csvData += [Cyr, Cmk, Cmd, Cbd,Csmd, Csz]
EN

回答 1

Stack Overflow用户

发布于 2019-05-27 15:51:00

我猜你不能用美丽的汤来解析年份的原因是因为当美丽的汤下载页面的时候,包含所有年份的'option‘标签的'select’标签还没有出现/被隐藏。我假设是通过执行额外的JavaScript将其添加到DOM中的。如果您使用浏览器的开发人员工具查看加载页面的DOM,例如Mozilla的F12,您将看到包含您要查找的信息的标记是:<select id="icm-years-select"">。如果你试图用用漂亮汤下载的对象来解析这个标签,你会得到一个空的标签对象列表:

代码语言:javascript
复制
from bs4 import BeautifulSoup
from requests import get
response = get('https://www.wheelmax.com/')
yourSoup = BeautifulSoup(response.text, "lxml")
print(len(yourSoup.select('div #vehicle-search'))) // length = 1 -> visible
print()
print(len(yourSoup.select('#icm-years-select')))    // length = 0 -> not visible

因此,如果您想通过各种方式使用Python获取年份,我猜您可能会尝试单击相应的标记,然后使用某种组合的请求/漂亮的汤/或selenium模块再次解析,这将需要更多的挖掘:-)

否则,如果您只需要快速解析年份,请使用JavaScript:

代码语言:javascript
复制
countYears = document.getElementById('icm-years-select').length;
yearArray = [];
for (i = 0; i < countYears; i++) {yearArray.push(document.getElementById('icm-years-select')[i].value)};
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56320560

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档