我正在尝试解析一个现场直播的体育比赛结果网站的html,但是我的代码并没有将所有的span标签都返回给这个网站。我在检查下看到所有的匹配项都是匹配的,但我的代码除了页脚或页眉之外,似乎找不到任何来自网站的东西。我也尝试过div,但它们也不起作用。我是个新手,有点迷茫,这是我的代码,有人能帮我吗?为了更清楚起见,我把for循环的第一部分留了下来。
#Creating the urls for the different dates
my_url='https://www.livescore.com/en/football/{}'.format(d1)
print(my_url)
today=date.today()-timedelta(days=i)
d1 = today.strftime("%Y-%m-%d/")
#Opening up the connection and grabbing the html
uClient=uReq(my_url)
page_html=uClient.read()
uClient.close()
#HTML parser
page_soup=soup(page_html,"html.parser")
spans=page_soup.findAll("span")
matches=page_soup.findAll("div", {"class":"LiveRow-w0tngo-0 styled__Root-sc-2sc0sh-0 styled__FootballRoot-sc-2sc0sh-4 eAwOMF"})
print(spans)
发布于 2021-02-17 17:53:34
该页面是动态的,由JS呈现。当你做一个请求时,你会在它被呈现之前得到静态的html响应。要处理这种情况,您可以做的事情很少:
requests-HTML
包,它也允许页面渲染(我以前没有尝试过这个包,因为它与我的IDE Spyder冲突)。这将类似于硒,而不是真正打开沸腾。<script>
标记中。有时您会在那里找到它,但是如果有某种类型的api (检查XHR)并直接从那里获取数据,则需要做一些工作才能将其提取出来,并符合/操纵为有效的json格式,以便使用json.loads()
最好的选择总是#4,如果它可用的话。为什么?因为数据的结构是一致的。即使网站改变了它的结构或css改变(这会改变你解析的html ),输入到它的底层数据也不会改变它的结构。此站点确实有访问数据的api:
import requests
import datetime
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'}
dates_list = ['20210214', '20210215', '20210216']
for dateStr in dates_list:
url = f'https://prod-public-api.livescore.com/v1/api/react/date/soccer/{dateStr}/0.00'
dateStr_alpha = datetime.datetime.strptime(dateStr, '%Y%m%d').strftime('%B %d')
response = requests.get(url, headers=headers).json()
stages = response['Stages']
for stage in stages:
location = stage['Cnm']
stageName = stage['Snm']
events = stage['Events']
print('\n\n%s - %s\t%s' %(location, stageName, dateStr_alpha))
print('*'*50)
for event in events:
outcome = event['Eps']
team1Name = event['T1'][0]['Nm']
if 'Tr1' in event.keys():
team1Goals = event['Tr1']
else:
team1Goals = '?'
team2Name = event['T2'][0]['Nm']
if 'Tr2' in event.keys():
team2Goals = event['Tr2']
else:
team2Goals = '?'
print('%s\t%s %s - %s %s' %(outcome, team1Name, team1Goals, team2Name, team2Goals))
输出:
England - Premier League February 15
********************************************************************************
FT West Ham United 3 - Sheffield United 0
FT Chelsea 2 - Newcastle United 0
Spain - LaLiga Santander February 15
********************************************************************************
FT Cadiz 0 - Athletic Bilbao 4
Germany - Bundesliga February 15
********************************************************************************
FT Bayern Munich 3 - Arminia Bielefeld 3
Italy - Serie A February 15
********************************************************************************
FT Hellas Verona 2 - Parma Calcio 1913 1
Portugal - Primeira Liga February 15
********************************************************************************
FT Sporting CP 2 - Pacos de Ferreira 0
Belgium - Jupiler League February 15
********************************************************************************
FT Gent 4 - Royal Excel Mouscron 0
Belgium - First Division B February 15
********************************************************************************
FT Westerlo 1 - Lommel 1
Turkey - Super Lig February 15
********************************************************************************
FT Genclerbirligi 0 - Besiktas 3
FT Antalyaspor 1 - Yeni Malatyaspor 1
Brazil - Serie A February 15
********************************************************************************
FT Gremio 1 - Sao Paulo 2
FT Ceara 1 - Fluminense 3
FT Sport Recife 0 - Bragantino 0
Italy - Serie B February 15
********************************************************************************
FT Cosenza 2 - Reggina 2
France - Ligue 2 February 15
********************************************************************************
FT Sochaux 2 - Valenciennes 0
FT Toulouse 3 - AC Ajaccio 0
Spain - LaLiga Smartbank February 15
********************************************************************************
FT Castellon 1 - Fuenlabrada 2
FT Real Oviedo 3 - Lugo 1
...
Uganda - Super League February 16
********************************************************************************
FT Busoga United FC 1 - Bright Stars FC 1
FT Kitara FC 0 - Mbarara City 1
FT Kyetume 2 - Vipers SC 2
FT UPDF FC 0 - Onduparaka FC 1
FT Uganda Police 2 - BUL FC 0
Uruguay - Primera División: Clausura February 16
********************************************************************************
FT Boston River 0 - Montevideo City Torque 3
International - Friendlies Women February 16
********************************************************************************
FT Guatemala 3 - Panama 1
Africa - Africa Cup Of Nations U20: Group C February 16
********************************************************************************
FT Ghana U20 4 - Tanzania U20 0
FT Gambia U20 0 - Morocco U20 1
Brazil - Amazonense: Group A February 16
********************************************************************************
Postp. Manaus FC ? - Penarol AC AM ?
发布于 2021-02-17 07:06:42
现在假设你有正确的类来抓取,一个简单的循环就可以工作了:
for i in soup.find_all("div", {"class":"LiveRow-w0tngo-0 styled__Root-sc-2sc0sh-0 styled__FootballRoot-sc-2sc0sh-4 eAwOMF"}):
print(i)
或将其添加到列表中:
teams = []
for i in soup.find_all("div", {"class":"LiveRow-w0tngo-0 styled__Root-sc-2sc0sh-0 styled__FootballRoot-sc-2sc0sh-4 eAwOMF"}):
teams.append(i.text)
print(teams)
如果这不起作用,运行一些测试,看看你是否真的抓取了正确的东西,例如打印一个单一的东西。
另外,在您的代码中,我看到您打印的是“span”而不是"matches",这也可能是您的代码的问题。
您还可以查看此,它进一步解释了如何做到这一点。
https://stackoverflow.com/questions/66233334
复制相似问题