问使用bs4和pandas解析html时，列表索引错误
EN

Stack Overflow用户

提问于 2018-07-18 04:33:05

回答 1查看 359关注 0票数 1

我有一个目录与许多子目录，其中一些包含html文件。每次我找到一个html文件，我都想把数据解析成一个pandas数据帧。问题是，我得到这个错误"IndexError: list index out of range“我的代码是这样的：

or root, subdirs, files in os.walk(walk_dir):
list_file_path = os.path.join(root, 'my-directory-list.txt')

with open(list_file_path, 'wb') as list_file:
    for subdir in subdirs:

      for filename in files:
        file_path = os.path.join(root, filename)



        with open(file_path, 'rb') as f:
            f_content = f.read()



            #Check if file is html
            check_html = bool(BeautifulSoup(f_content, "lxml").find())
            #If it is HTML

            if check_html == True:
              print ("It's html")
              soup = BeautifulSoup(f_content, 'lxml')
              table = soup.find_all('table')[0]
              df = pd.read_html(str(table))
              print(tabulate(df[0]))





            #If it is not HTML 
            else:
              print ("its not")

我知道我的错误是这行table = soup.find_all('table')[0]中的df，但我不知道如何处理它。

另外，我尝试解析的html文件通常如下所示(第一行)：

<!DOCTYPE html> <html class="no-js"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> <title>   text1 </title>

在某些情况下，它们还包含表。有什么需要帮忙的吗？

python

html

pandas

beautifulsoup

回答 1

Stack Overflow用户

发布于 2018-07-18 04:41:30

可能table = soup.find_all('table')[0]正在返回None。你是怎么说的，有时它可以有表，有时没有，你必须处理异常。一个简单的验证if table:可能就足够了。

if check_html == True:
          print ("It's html")
          soup = BeautifulSoup(f_content, 'lxml')
          table = soup.find_all('table')[0]
          if table:
               df = pd.read_html(str(table))
               print(tabulate(df[0]))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51389783

复制

相似问题

问使用bs4和pandas解析html时，列表索引错误
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用bs4和pandas解析html时，列表索引错误EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用bs4和pandas解析html时，列表索引错误
EN