下面是我在宏中实现的3个请求的声明变量。我在注释中列出了它们使用的库及其后期绑定:
Dim XMLHTTP As New MSXML2.XMLHTTP 'Microsoft XML, v6.0 'Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
Dim ServerXMLHTTP As New MSXML2.ServerXMLHTTP 'Microsoft XML, v6.0 'Set ServerXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Dim http As New WinHttpRequest 'Microsoft WinHttp Services, version 5.1 'Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
我有几个旧的web抓取宏,它使用Internet自动化。我想清理代码,并加快他们的这些要求。
不幸的是,我注意到,MSXML2.ServerXMLHTTP
和WinHttpRequest
在在线商店的20个产品测试(34和35秒)上比IE自动测试慢(24秒),MSXML2.XMLHTTP
在18秒内执行。我曾经看到过这样的情况:这3种请求中有些比其他请求快2-3倍,所以我总是测试哪一种表现最好,但从来没有因为IE自动化而丢失过任何请求。
包含结果的主页面如下,所有结果都在一个页面上,即它们的1500+,所以请求需要一些时间(如果粘贴到MS,则需要6500页):
bat/products/bat类型的棒球/?sortBy=TotalSales Descending&page=1&size=2400
然后打开主结果页面中的单个链接:
http://www.justbats.com/product/2017-marucci-cat-7-bbcor-baseball-bat--mcbc7/24317/
我想知道,如果这3个请求都是选项,我必须从网站获得数据,没有浏览器自动化。另外,浏览器自动化如何能够克服其中的一些请求?
更新
我已经用提供的过程测试了主结果页,在运行之前清除IE缓存。至少在这个特定的页面上,缓存似乎没有明显的好处,因为随后的请求产生了类似的结果。IE禁用了活动脚本,没有加载图像。
IE automation method, Document length: 7593346 chars, Processed in: 8 seconds
WinHTTP method, Document length: 7824059 chars, Processed in: 29 seconds
XML HTTP method, Document length: 7830217 chars, Processed in: 4 seconds
Server XML HTTP method, Document length: 7823958 chars, Processed in: 26 seconds
URL download file method, Document length: 7830346 chars, Processed in: 7 seconds
令我惊讶的是,这些方法返回的字符数量不同。
发布于 2017-01-11 07:49:47
除了您提到的方法之外,还包括:
您还可以考虑另外两种方法:
CreateDocumentFromUrl
对象的MSHTML.HTMLDocument
方法URLDownloadToFileA
我忽略了其他一些Windows,如InternetOpen
、InternetOpenUrl
等,因为猜测响应长度、缓冲响应等等的复杂性将超过潜在的性能。
CreateDocumentFromUrl
使用CreateDocumentFromUrl
方法,这是示例网站的一个问题,因为它试图在一个框架中创建一个HTMLDocument
,这是不允许出现错误的,例如:
框架禁止
和
为了保护您输入到此网站的信息的安全性,此内容的发行者不允许将其显示在框架中。
所以我们不应该使用这种方法。
URLDownloadToFileA
我以为您需要与php等价的file_get_contents
,并找到了这个方法。它很容易使用(检查此链接),并且在一个大的请求中使用其他方法(例如,当你使用超过2000支棒球棒时尝试它)。XMLHTTP
方法也使用URLMon
库,所以我想这种方法只是减少了一些中间人的逻辑,而且显然有一个缺点,因为您必须做一些文件系统处理。
Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
ByVal lpfnCB As Long) As Long
Sub TestUrlDownloadFile(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim strTempFileName As String
Dim strResponse As String
Dim objFso As FileSystemObject
On Error GoTo ExitFunction
dteStart = Now
strTempFileName = "D:\foo.txt"
DownloadFile strUrl, strTempFileName
Set objFso = New FileSystemObject
With objFso.OpenTextFile(strTempFileName, ForReading)
strResponse = .ReadAll
.Close
End With
objFso.DeleteFile strTempFileName
dteFinish = Now
Debug.Print "URL download file method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
End Sub
'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
Dim lngRetVal As Long
lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
If lngRetVal = 0 Then DownloadFile = True
End Function
使用URLDownloadToFileA
,我需要大约1-2秒的时间来下载样例URL,而用XMLHTTP
方法则需要4到5秒(下面的完整代码)。
URL:
bat/products/bat类型的棒球/?sortBy=TotalSales Descending&page=1&size=2400
这是输出:
Testing...
XML HTTP method
Document length: 7869753 chars
Processed in: 4 seconds
URL download file method
Document length: 7869753 chars
Processed in: 1 seconds
代码
这包括讨论的所有方法,例如IE自动化、WinHTTPRequest、XMLHTTP、ServerXMLHTTP、CreateDocumentFromURL和URLDownloadFile。
您需要项目中的所有这些引用:
下面是:
Option Explicit
Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
ByVal lpfnCB As Long) As Long
Sub Test()
Dim strUrl As String
strUrl = "http://www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400"
Debug.Print "Testing..."
Debug.Print VBA.vbNewLine
'TestIE strUrl
'TestWinHHTP strUrl
TestXMLHTTP strUrl
'TestServerXMLHTTP strUrl
'TestCreateDocumentFromUrl strUrl
TestUrlDownloadFile strUrl
End Sub
Sub TestIE(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim objIe As InternetExplorer
Dim objHtml As MSHTML.HTMLDocument
Dim strResponse As String
On Error GoTo ExitFunction
dteStart = Now
Set objIe = New SHDocVw.InternetExplorer
With objIe
.navigate strUrl
.Visible = False
While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set objHtml = .document
strResponse = objHtml.DocumentElement.outerHTML
.Quit
End With
dteFinish = Now
Debug.Print "IE automation method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
If Not objIe Is Nothing Then
objIe.Quit
End If
Set objIe = Nothing
End Sub
Sub TestWinHHTP(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim objHttp As WinHttp.WinHttpRequest
Dim objDoc As HTMLDocument
Dim strResponse As String
On Error GoTo ExitFunction
dteStart = Now
Set objHttp = New WinHttp.WinHttpRequest
With objHttp
.Open "get", strUrl, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
.WaitForResponse
strResponse = .responseText
End With
dteFinish = Now
Debug.Print "WinHTTP method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc = Nothing
Set objHttp = Nothing
End Sub
Sub TestXMLHTTP(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim objXhr As MSXML2.XMLHTTP60
Dim objDoc As MSHTML.HTMLDocument
Dim strResponse As String
On Error GoTo ExitFunction
dteStart = Now
Set objXhr = New MSXML2.XMLHTTP60
With objXhr
.Open "get", strUrl, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
While .readyState <> 4
DoEvents
Wend
strResponse = .responseText
End With
dteFinish = Now
Debug.Print "XML HTTP method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc = Nothing
Set objXhr = Nothing
End Sub
Sub TestServerXMLHTTP(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim objXhr As MSXML2.ServerXMLHTTP60
Dim objDoc As MSHTML.HTMLDocument
Dim strResponse As String
On Error GoTo ExitFunction
dteStart = Now
Set objXhr = New MSXML2.ServerXMLHTTP60
With objXhr
.Open "get", strUrl, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
While .readyState <> 4
DoEvents
Wend
strResponse = .responseText
End With
dteFinish = Now
Debug.Print "Server XML HTTP method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc = Nothing
Set objXhr = Nothing
End Sub
Sub TestUrlDownloadFile(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim strTempFileName As String
Dim strResponse As String
Dim objFso As FileSystemObject
On Error GoTo ExitFunction
dteStart = Now
strTempFileName = "D:\foo.txt"
If DownloadFile(strUrl, strTempFileName) Then
Set objFso = New FileSystemObject
With objFso.OpenTextFile(strTempFileName, ForReading)
strResponse = .ReadAll
.Close
End With
objFso.DeleteFile strTempFileName
Else
Debug.Print "Error downloading file from URL: " & strUrl
GoTo ExitFunction
End If
dteFinish = Now
Debug.Print "URL download file method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
End Sub
'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
Dim lngRetVal As Long
lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
If lngRetVal = 0 Then
DownloadFile = True
Else
DownloadFile = False
End If
End Function
Sub TestCreateDocumentFromUrl(strUrl As String)
Dim dteStart As Date
Dim dteFinish As Date
Dim strResponse As String
Dim objDoc1 As HTMLDocument
Dim objDoc2 As HTMLDocument
On Error GoTo ExitFunction
dteStart = Now
Set objDoc1 = New HTMLDocument
Set objDoc2 = objDoc1.createDocumentFromUrl(strUrl, "null")
While objDoc2.readyState <> "complete"
DoEvents
Wend
strResponse = objDoc2.DocumentElement.outerHTML
Debug.Print strResponse
dteFinish = Now
Debug.Print "HTML Document Create from URL method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine
ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc2 = Nothing
Set objDoc1 = Nothing
End Sub
发布于 2017-01-13 22:10:41
大部分时间用于等待服务器的响应。因此,如果您希望改进执行时间,则并行发送请求。
我还将使用"Msxml2.ServerXMLHTTP.6.0“对象/接口,因为它不实现任何缓存。
下面是一个有用的例子:
Sub TestRequests()
GetUrls _
"http://stackoverflow.com/questions/34880012", _
"http://stackoverflow.com/questions/34880013", _
"http://stackoverflow.com/questions/34880014", _
"http://stackoverflow.com/questions/34880015", _
"http://stackoverflow.com/questions/34880016", _
"http://stackoverflow.com/questions/34880017"
End Sub
Private Sub OnRequest(url, xhr)
xhr.Open "GET", url, True
xhr.setRequestHeader "Content-Type", "text/html; charset=UTF-8"
xhr.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
xhr.Send
End Sub
Private Sub OnResponse(url, xhr)
Debug.Print url, Len(xhr.ResponseText)
End Sub
Public Function GetUrls(ParamArray urls())
Const WORKERS = 10
' create http workers
Dim wkrs(0 To WORKERS * 2 - 1), i As Integer
For i = 0 To UBound(wkrs) Step 2
Set wkrs(i) = CreateObject("Msxml2.ServerXMLHTTP.6.0")
Next
' send the requests in parallele
Dim index As Integer, count As Integer, xhr As Object
While count <= UBound(urls)
For i = 0 To UBound(wkrs) Step 2
Set xhr = wkrs(i)
If xhr.readyState And 3 Then ' if busy
xhr.waitForResponse 0.01 ' wait 10ms
ElseIf Not VBA.IsEmpty(wkrs(i + 1)) And xhr.readyState = 4 Then
OnResponse urls(wkrs(i + 1)), xhr
count = count + 1
wkrs(i + 1) = Empty
End If
If VBA.IsEmpty(wkrs(i + 1)) And index <= UBound(urls) Then
wkrs(i + 1) = index
OnRequest urls(index), xhr
index = index + 1
End If
Next
Wend
End Function
https://stackoverflow.com/questions/41523223
复制相似问题