我想用熊猫来提取网页上所有的表格内容。
以下是代码:
import pandas as pd
from bs4 import BeautifulSoup
link ="http://vip.win007.com/AsianOdds_n.aspx?id=1957300"
macau_asianodds_list = list()
asianodds = pd.read_html(link, flavor='bs4', header = 0)
asianodds[0]
df_NaN = asianodds[0]
asianodds = df_NaN.drop(df_NaN[df_NaN.多盘口 == '多盘口'].index)
asianodds.drop('多盘口', inplace=True, axis=1)
asianodds.drop('历史资料', inplace=True, axis=1)
df1 = asianodds.iloc[0:1]
df2 = asianodds.iloc[1:2]
df3 = asianodds.iloc[2:3]
df4 = asianodds.iloc[3:4]
df5 = asianodds.iloc[4:5]
df6 = asianodds.iloc[5:6]
df7 = asianodds.iloc[6:7]
df8 = asianodds.iloc[7:8]
df9 = asianodds.iloc[8:9]
df10 = asianodds.iloc[9:10]
df11 = asianodds.iloc[10:11]
df12 = asianodds.iloc[11:12]
df13 = asianodds.iloc[12:13]
df14 = asianodds.iloc[13:14]
macau_asianodds = pd.concat([df1,df2,df3,df4,df5,df6,df7,df8,df9,df10,
df11,df12,df13,df14], axis=1, sort=False)
macau_asianodds.to_excel("c:/logs/history/test.xls",index=False)
excel文件的同一行中的df1、df2、df3....not。我修改了密码。
发布于 2020-12-09 17:11:09
请参阅read_html()上的“read_html”。它可以是'lxml',‘bs4 4’,'html5lib‘等等?
味道=‘bs4 4’提供了所有行。
风味=‘html5lib’提供了所有的行+许多额外的信息..。
https://stackoverflow.com/questions/65221221
复制相似问题