我正在尝试从一组网页中抓取数据,就像这样:https://www.cookcountyassessor.com/pin/14333230200000/print
大部分数据似乎是由一个具有多个同级的CSS类引用的,这个类有多个同级,名为"detail-row--detail“(数据标签包含在”detail-row--label“中)。因此,第一个数据项包含在detail-row--detail:eq(0)中,第二个数据项包含在detail-row--detail:eq(1)中,依此类推。我的VBA将获取第一个detail-row--detail,但不会获取任何后续项。
下面是我的代码的简化代码片段。单元格TargetURL包含上面的URL。range CSSRange包含3个值:"print-pint“、"address”和“detail-CSSRange--detail”(不带引号)。MsgBox (仅用于测试目的)正确地返回了前2个CSSRange项(没有多个兄弟项)的值。对于第三个CSS项(有31个兄弟项),它会正确地运行For-Each循环次数,但每次都会返回第一个兄弟项的值。对于如何获取后续兄弟的价值,有什么建议吗?
Sub SnippetForStackOverflow()
'Be sure to load Tools > References "Microsoft Internet Controls" & "Microsoft HTML Object Library"
Dim ShtSource As Worksheet
Dim CSSRange As Range
Dim TargetURL As Range
Dim rng As Range
Dim n As Integer
Dim webpage As HTMLDocument
Dim element As IHTMLElement
Dim Output As String
Dim ie As InternetExplorer
'Get things ready
Set ShtSource = Sheets("PINforVBA")
Set TargetURL = ShtSource.Range("$B$2")
Set CSSRange = ShtSource.Range("$B$5:$B$7")
'Open IE in memory, go to site
Set ie = New InternetExplorer
ie.Visible = True 'SET AS FALSE UNLESS DEBUGGING
ie.navigate (TargetURL.Value)
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading Web page …"
DoEvents
Loop
Set webpage = ie.document
'Scrape desired elements
For Each rng In CSSRange
For Each element In webpage.getElementsByClassName(rng.Value)
n = n + 1
Output = webpage.getElementsByClassName(rng.Value)(0).innerText
MsgBox (n & ": " & Output)
Next
n = 0
Next
'Wrap it up
ie.Quit
Set ie = Nothing
End Sub
发布于 2020-10-03 05:58:25
我已经注释了我在代码中所做的事情。包括重置状态栏。
Sub SnippetForStackOverflow()
'Be sure to load Tools > References "Microsoft Internet Controls" & "Microsoft HTML Object Library"
Dim ShtSource As Worksheet
Dim CSSRange As Range
Dim TargetURL As Range
Dim rng As Range
Dim n As Integer
Dim webpage As HTMLDocument
Dim element As IHTMLElement
Dim Output As String
Dim ie As InternetExplorer
'Get things ready
Set ShtSource = Sheets("PINforVBA")
Set TargetURL = ShtSource.Range("$B$2")
Set CSSRange = ShtSource.Range("$B$5:$B$7")
'Open IE in memory, go to site
Set ie = New InternetExplorer
ie.Visible = True 'SET AS FALSE UNLESS DEBUGGING
ie.navigate (TargetURL.Value)
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading Web page …"
DoEvents
Loop
Set webpage = ie.document
'Scrape desired elements
For Each rng In CSSRange
For Each element In webpage.getElementsByClassName(rng.Value)
n = n + 1
'The following line searches in the whole document
'That's the reason, why always the first Element was listed
'Output = webpage.getElementsByClassName(rng.Value)(0).innerText
'Some changes in the code will do what is wanted
Output = element.innerText
MsgBox (n & ": " & Output)
Next
n = 0
Next
'Wrap it up
ie.Quit
Set ie = Nothing
'Reset status bar
Application.StatusBar = False
End Sub
https://stackoverflow.com/questions/64163199
复制相似问题