首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >BeautifulSoup -从一个页面中抓取多个表?

BeautifulSoup -从一个页面中抓取多个表?
EN

Stack Overflow用户
提问于 2016-12-23 17:55:23
回答 2查看 3.7K关注 0票数 1

我试图从包含多个表的这个URL中抓取内容。预期的产出将是:

代码语言:javascript
运行
复制
NAME        FG% FT% 3PM REB AST STL BLK TO  PTS     SCORE
Team Jackson (0-8)      .4313   .7500   21  71  34  11  12  15  189     1-8-0
Team Keyrouze (4-4)     .4441   .8090   31  130 71  18  13  45  373     8-1-0
Nutz Vs. Draymond Green (4-4)       .4292   .8769   30  86  66  15  9   28  269     3-6-0
Team Pauls 2 da Wall (3-5)      .4784   .8438   40  123 64  18  20  30  316     6-3-0
Team Noey (2-6)     .4350   .7679   21  125 62  20  9   33  278     7-2-0
YOU REACH, I TEACH (2-5-1)      .4810   .7432   20  114 56  30  7   50  277     2-7-0
Kris Kaman His Pants (5-3)      .4328   .8000   20  74  59  20  5   27  238     3-6-0
Duke's Balls In Daniels Face (3-4-1)        .5000   .7045   42  139 38  27  22  30  303     6-3-0
Knicks Tape (5-3)       .5000   .8152   34  143 92  12  9   47  397     4-5-0
Suck MyDirk (5-3)       .4734   .8814   29  106 86  22  17  40  435     5-4-0
In Porzingod We Trust (4-4)     .4928   .7222   27  180 95  16  16  46  423     7-2-0
Team Aguilar (6-1-1)        .4718   .7053   28  177 65  12  35  48  413     2-7-0
Team Li (7-0-1)     .4714   .8118   35  134 74  17  17  47  368     6-3-0
Team Iannetta (4-4)     .4527   .7302   22  125 90  20  13  44  288     3-6-0

如果这样的表格格式太难的话,我想知道怎样才能刮掉所有的表格?我刮行的代码如下所示:

代码语言:javascript
运行
复制
tableStats = soup.find('table', {'class': 'tableBody'})
rows = tableStats.findAll('tr')

for row in rows:
    print(row.string)

但它只印出了“团队”的价值,没有别的东西.为什么它不包含表中的所有行?

谢谢。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-01-07 23:43:22

找到一种方法来精确地得到问题中我指定的二维矩阵。它是作为列表组存储的。

代码:

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
import requests

source_code = requests.get("http://games.espn.com/fba/scoreboard?leagueId=224165&seasonId=2017")
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'lxml')
teams = []
rows = soup.findAll('tr', {'class': 'linescoreTeamRow'})

# Creates a 2-D matrix.
for row in range(len(rows)):
    team_row = []
    columns = rows[row].findAll('td')
    for column in columns:
        team_row.append(column.getText())
    print(team_row)
    # Add each team to a teams matrix.
    teams.append(team_row)

输出:

代码语言:javascript
运行
复制
['Team Jackson (0-10)', '', '.4510', '.8375', '41', '135', '101', '23', '11', '50', '384', '', '5-4-0']
['YOU REACH, I TEACH (3-6-1)', '', '.4684', '.7907', '22', '169', '103', '22', '10', '32', '342', '', '4-5-0']
['Nutz Vs. Draymond Green (4-6)', '', '.4552', '.8372', '30', '157', '68', '15', '16', '39', '356', '', '2-7-0']
["Jesse's  Blue Balls (4-5-1)", '', '.4609', '.7576', '47', '158', '71', '30', '20', '38', '333', '', '7-2-0']
['Team Noey (4-6)', '', '.4763', '.8261', '42', '164', '70', '25', '29', '44', '480', '', '5-4-0']
['Suck MyDirk (6-3-1)', '', '.4733', '.8403', '54', '160', '132', '23', '11', '47', '544', '', '4-5-0']
['Kris Kaman  His Pants (5-5)', '', '.4569', '.8732', '53', '138', '105', '27', '21', '53', '465', '', '6-3-0']
['Team Aguilar (6-3-1)', '', '.4433', '.7229', '40', '202', '68', '30', '22', '54', '452', '', '3-6-0']
['Knicks Tape (6-3-1)', '', '.4406', '.8824', '52', '172', '108', '24', '13', '49', '513', '', '6-3-0']
['Team Iannetta (4-6)', '', '.5321', '.6923', '24', '146', '94', '32', '16', '60', '428', '', '3-6-0']
['In Porzingod We Trust (6-4)', '', '.4694', '.6364', '37', '216', '133', '31', '21', '77', '468', '', '4-5-0']
['Team Keyrouze (6-4)', '', '.4705', '.8854', '51', '135', '108', '25', '17', '43', '550', '', '5-4-0']
['Team Li (8-1-1)', '', '.4369', '.8182', '57', '203', '130', '34', '22', '54', '525', '', '6-3-0']
['Team Pauls 2 da Wall (5-5)', '', '.4780', '.5970', '27', '141', '47', '19', '25', '28', '263', '', '3-6-0']
票数 0
EN

Stack Overflow用户

发布于 2016-12-23 18:31:31

与其查找table标记,不如直接使用更可靠的class (如linescoreTeamRow )查找行。这段代码段起了作用,

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
import requests
a = requests.get("http://games.espn.com/fba/scoreboard?leagueId=224165&seasonId=2017")
soup = BeautifulSoup(a.text, 'lxml')
# searching for the rows directly
rows = soup.findAll('tr', {'class': 'linescoreTeamRow'})
# you will need to isolate elements in the row for the table
for row in rows:
    print row.text
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/41305852

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档