我正在试着根据亚马逊独特的产品代码列出产品清单。
例如:https://www.amazon.in/gp/product/B00F2GPN36
其中B00F2GPN36是唯一的代码。
我希望将产品的图像和标题提取到产品图像和产品名称列下的Excel列表中。
我已经尝试过html.getElementsById("productTitle")
和html.getElementsByTagName
。
我也不知道该用哪种变量来存储上述信息,因为我已经尝试过声明Object
类型和HtmlHtmlElement
了。
我试图提取html文档,并使用它进行数据搜索。
代码:
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub parsehtml()
Dim ie As InternetExplorer
Dim topics As Object
Dim html As HTMLDocument
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "https://www.amazon.in/gp/product/B00F2GPN36"
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to Amazon.in...."
DoEvents
Loop
Application.StatusBar = ""
Set html = ie.document
Set topics = html.getElementsById("productTitle")
Sheets(1).Cells(1, 1).Value = topics.innerText
Set ie = Nothing
End Sub
我希望输出是单元格A1中的输出:
“米尔顿热钢瓶子,2升,银”应该反映(没有引号),同样我也想拉图像。
但是总会有一些错误,比如:
当我使用"Dim topics As HTMLHtmlElement"
对象不支持此属性或方法
注意:我添加了来自Tools > References的引用,即所需的库。
发布于 2019-06-02 22:04:03
更快的方法是使用xhr,避免使用浏览器并将结果从数组写到工作表中
Option Explicit
Public Sub GetInfo()
Dim html As HTMLDocument, results()
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.amazon.in/gp/product/B00F2GPN36", False
.send
html.body.innerHTML = .responseText
With html
results = Array(.querySelector("#productTitle").innerText, .querySelector("#landingImage").getAttribute("data-old-hires"))
End With
End With
With ThisWorkbook.Worksheets("Sheet1")
.Cells(1, 1) = results(0)
Dim file As String
file = DownloadFile("C:\Users\User\Desktop\", results(1)) 'your path to download file
With .Pictures.Insert(file)
.Left = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Left
.Top = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Top
.Width = 75
.Height = 100
.Placement = 1
End With
End With
Kill file
End Sub
发布于 2019-06-02 21:10:20
在vba中没有html.getElementsById("productTitle")
这样的东西。ID始终是唯一的,因此应为html.getElementById("productTitle")
。运行以下脚本以获取它们:
Sub ParseHtml()
Dim IE As New InternetExplorer, elem As Object
Dim Html As HTMLDocument, imgs As Object
With IE
.Visible = False
.navigate "https://www.amazon.in/gp/product/B00F2GPN36"
While .Busy Or .readyState < 4: DoEvents: Wend
Set Html = .document
End With
Set elem = Html.getElementById("productTitle")
Set imgs = Html.getElementById("landingImage")
Sheets(1).Cells(1, 1) = elem.innerText
Sheets(1).Cells(1, 1).Offset(0, 1) = imgs.getAttribute("data-old-hires")
End Sub
https://stackoverflow.com/questions/56415063
复制相似问题