我已经编写了代码,从网站上抓取一个表格,并从该表格中拉出每个单元格,然后将它们放入excel电子表格中。当网站正确加载时,代码可以完美地工作。
问题是网站不能很好地使用internet explorer,因此代码只有大约一半的时间可以成功执行。我可以写一个例程来检查网站是否加载成功,如果没有,就重复一次,但是我想看看是否可以让它与WinHTTPRequest一起工作。
下面几行是我如何使用基于internet explorer的webscraping访问表格,最后一行是我如何将表格加载到一个变量中。
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://weather.com/weather/tenday/l/12345:4:US"
IE.Visible = True
Application.Wait (Now() + TimeValue("00:02:00"))
Set doc = IE.document
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
我可以使用下面的代码通过WinHTTPRequest加载有问题的网站。
Set doc = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://weather.com/weather/tenday/l/12345:4:US", False
.send
doc.body.innerHTML = .responseText
End With
但是,当我尝试使用下面这行代码抓取表时,我得到“运行时错误'438':Object不支持此属性或方法”。
Set WeatherTable = doc.getElementByclassname("twc-table")(0)
基本上,我需要WinHTTP网络抓取的等价物。
我已经研究了向下遍历html文档(doc.body.all.item(1)等),但在遇到错误之前并没有走多远。我也看过Selenium插件,但我不记得是否能够成功下载和安装它,我甚至不确定它是否仍然适用于当前版本的chrome / firefox。
这是完整的代码,允许我通过internet explorer webscraping获取表格,然后将其放到excel电子表格中。
任何帮助都是非常感谢的。
Sub GetTable2()
Dim IE As Object
Dim doc As HTMLDocument
Dim WeatherTable As HTMLTable
Dim WeatherTableRows As HTMLTableRow
Dim HTMLTableCell As HTMLTableCell
Dim HeaderRow As Boolean
Dim RowCount As Long
Dim ColumnCount As Long
Dim i As Long
RowCount = 1
ColumnCount = 1
HeaderRow = True
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://weather.com/weather/tenday/l/12345:4:US"
IE.Visible = True
'Application.Wait (Now() + TimeValue("00:02:00"))
Set doc = IE.document
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
For Each WeatherTableRows In WeatherTable.Rows
i = 1
For Each HTMLTableCell In WeatherTableRows.Cells
If HeaderRow = True Then
ThisWorkbook.Sheets("Sheet5").Cells(RowCount, ColumnCount).Value = HTMLTableCell.innerText
ColumnCount = ColumnCount + 1
Else
If i = 1 Then
i = i + 1
Else
ThisWorkbook.Sheets("Sheet5").Cells(RowCount, ColumnCount).Value = HTMLTableCell.innerText
ColumnCount = ColumnCount + 1
End If
End If
Next HTMLTableCell
HeaderRow = False
ColumnCount = 1
RowCount = RowCount + 1
Next WeatherTableRows
IE.Quit
Set IE = Nothing
Set doc = Nothing
End Sub
发布于 2018-06-09 18:32:01
你错过了一个s
。它是复数,因为您将通过className获得一个元素s
的集合。
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
发布于 2018-06-09 19:51:42
为了使您的方法更简洁,您也可以尝试这种方式。
Sub FetchTabularData()
Dim elem As Object, trow As Object, S$, R&, C&
[B1:G1] = [{"Day","Description","High/Low","Precip","Wind","Humidity"}]
With New WinHttp.WinHttpRequest
.Open "GET", "https://weather.com/weather/tenday/l/12345:4:US", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
For Each elem In .querySelector(".twc-table").getElementsByTagName("tr")
For Each trow In elem.getElementsByTagName("td")
C = C + 1: Cells(R + 1, C) = trow.innerText
Next trow
C = 0: R = R + 1
Next elem
End With
End Sub
要添加的引用:
Microsoft HTML Object Library
Microsoft WinHTTP Services, version 5.1
https://stackoverflow.com/questions/50777294
复制相似问题