我正在尝试从我的代码中找到的wesbite中抓取一个webtable。基本上,我只想抓取今天的比赛,当我的for循环到达HTML表中包含第二天比赛信息的那部分时,它会停止。我试过用谷歌搜索这个,但似乎还是解决不了这个问题。任何帮助都将不胜感激。我的代码发布在下面。
url='http://www.oddsportal.com/baseball/usa/mlb/'
driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
driver.find_element_by_id('user-header-timezone-expander').click() #get to est timezone
time.sleep(2)
driver.find_element_by_xpath("//*[contains(text(), 'GMT - 4')]").click() #get to est timezone
time.sleep(2)
content=driver.page_source
soup=BeautifulSoup(content,'lxml')
file_dates = []
todays_games=soup.find('table',{'class':'table-main'})
dummy_row=soup.find_all(attrs={'class':'table-dummyrow'})
for games in todays_games.select('td.table-time.datet'): #gets the time of the game
games= [games.text]
file_dates.append(games)
if dummy_row==dummy_row[1]: #I want the for loop to break when it hits the gray header titled "Tomorrow, 22 Jul" on the webpage
break
print(file_dates) #still returns every game on the website though
发布于 2018-07-22 06:34:34
要获得只有今天的比赛时间,您可以尝试以下代码:
games = [td.text for td in driver.find_elements_by_xpath('//table[@id="tournamentTable"]//td[contains(@class, "datet") '
'and following::span[starts-with(., "Tomorrow,")]]')]
print(games)
如果您仍然想使用bs4,请尝试:
file_dates = []
todays_games=soup.find('table',{'class':'table-main'})
for games in todays_games.select('tr')[2:]:
if games.select('td.datet'):
file_dates.append(games.select('td.datet')[0].text)
if games.select('th'):
break
https://stackoverflow.com/questions/51460504
复制相似问题