我在VBA中有下面的代码,下面是对我的last question的回答,它迭代URL列表并生成一个使用单词提取文本的文本文件。
但是,对于下面的URL;https://hpvchemicals.oecd.org/ui/handler.axd?id=97a7b56f-ebaf-4416-8b4b-88b19ca3bd16,运行时错误“5”无效的过程调用或参数导致代码失败。
奇怪的是,将PDF的文本打印到控制台,但不会写入文本文件.。
我不太明白为什么会发生这种情况,因为PDF似乎与其他成功的PDF并没有什么不同。
VBA代码:需要Microsoft Scripting Runtime
引用
Sub Tester()
Dim filePath As String
Dim fso As FileSystemObject
Set fso = New FileSystemObject
Dim fileStream As TextStream
Dim oWd As Object, oDoc As Object, c As Range
Set oWd = CreateObject("word.application")
n = 1
For Each c In Range("B2:B200").Cells
filePath = Range("D2").Value & "\" & Range("A" & n).Value & ".txt"
Debug.Print filePath
Set fileStream = fso.CreateTextFile(filePath)
Debug.Print c.Value
With oWd.Documents.Open(c.Value)
Debug.Print .Range.Text
'write to a file...
fileStream.WriteLine .Range.Text
fileStream.Close
End With
n = n + 1
Next c
oWd.Quit
End Sub
SetUp:
网址:
https://hpvchemicals.oecd.org/ui/handler.axd?id=e19d2799-0c16-496d-a607-b09330dd28a7
https://hpvchemicals.oecd.org/ui/handler.axd?id=40da06b1-a855-4c0c-bc21-bbc856dca725
https://hpvchemicals.oecd.org/ui/handler.axd?id=c4967546-1f5e-472a-b629-a2998323735b
https://hpvchemicals.oecd.org/ui/handler.axd?id=bde5e625-83ee-423d-aa70-eb0e453088e4
https://hpvchemicals.oecd.org/ui/handler.axd?id=621c4f55-ef3c-4b99-bb98-e6aaf3f436dd
https://hpvchemicals.oecd.org/ui/handler.axd?id=26e1420d-f9b7-4768-b6fa-d345f54e7683
https://hpvchemicals.oecd.org/ui/handler.axd?id=263f3491-90c7-4c3a-b43e-4c4e9395bcea
https://hpvchemicals.oecd.org/ui/handler.axd?id=b78d39a9-26c2-48ff-aadc-cb056a89f08b
https://hpvchemicals.oecd.org/ui/handler.axd?id=97a7b56f-ebaf-4416-8b4b-88b19ca3bd16
https://hpvchemicals.oecd.org/ui/handler.axd?id=c6c3b7c1-9239-40d9-b51a-85a15e2411d6
因此,首先,我认为最后一个URL的问题可能会得到解决。然而,我认为我还需要引入一些错误处理,以生成一个空白文本文件,并继续到下一个,但我不知道如何实现这一点。
我不太擅长使用VBA,我已经指定重复行B2:B 200,但理想的情况是,无论有多少URL,它都能工作到这个数字。
此外,也不确定我的代码中的逻辑是否特别健壮/是否有更好的方法从URL中提取文本。
预期产出如下:
下面是生成的文本文件的一个示例。
发布于 2022-08-21 18:56:00
似乎从这个问题返回的文档URL包含一些不能写入非unicode文本文件的字符。
见内嵌评论:
Sub Tester()
Dim filePath As String
Dim fso As FileSystemObject, url
Dim fileStream As TextStream, ws As Worksheet
Dim oWd As Object, oDoc As Object, c As Range, fileRoot As String
Set fso = New FileSystemObject
Set oWd = CreateObject("word.application")
Set ws = Worksheets("Data") 'use a specific worksheet reference
fileRoot = ws.Range("D2").Value 'read this once
If Right(fileRoot, 1) <> "\" Then fileRoot = fileRoot & "\" 'ensure terminating \
For Each c In ws.Range("B2:B" & ws.Cells(Rows.Count, "B").End(xlUp).row).Cells
url = Trim(c.Value)
If LCase(url) Like "http?:*" Then 'has a URL
Set oDoc = Nothing
On Error Resume Next 'ignore error if no document...
Set oDoc = oWd.Documents.Open(url)
On Error GoTo 0 'stop ignoring errors
If Not oDoc Is Nothing Then
filePath = fileRoot & c.Offset(0, -1).Value & ".txt" 'filename from ColA
Debug.Print filePath
'open text stream as unicode
Set fileStream = fso.CreateTextFile(filePath, overwrite:=True, Unicode:=True)
fileStream.Write oDoc.Range.Text
fileStream.Close
oDoc.Close
c.Interior.Color = vbGreen 'flag OK
Else
c.Interior.Color = vbRed 'flag problem
End If
End If 'have url
Next c
oWd.Quit
End Sub
发布于 2022-08-21 18:56:00
使用excel记事本或任何文本意味着您可以使用诸如csv这样的导出列来构建一个cmd文件,该文件可以具有额外的结构,比如在运行时不显示,但个人希望使用call
查看已确认的进度说明--第二个参数必须“引用”。
download2txt.cmd
call URLpdf2txt Name1 "https://hpvchemicals.oecd.org/UI/handler.axd?id=e19d2799-0c16-496d-a607-b09330dd28a7"
call URLpdf2txt Name2 "https://hpvchemicals.oecd.org/UI/handler.axd?id=40da06b1-a855-4c0c-bc21-bbc856dca725"
call URLpdf2txt Name3 "https://hpvchemicals.oecd.org/UI/handler.axd?id=c4967546-1f5e-472a-b629-a2998323735b"
call URLpdf2txt Name4 "https://hpvchemicals.oecd.org/UI/handler.axd?id=bde5e625-83ee-423d-aa70-eb0e453088e4"
call URLpdf2txt Name5 "https://hpvchemicals.oecd.org/UI/handler.axd?id=621c4f55-ef3c-4b99-bb98-e6aaf3f436dd"
call URLpdf2txt Name6 "https://hpvchemicals.oecd.org/UI/handler.axd?id=26e1420d-f9b7-4768-b6fa-d345f54e7683"
call URLpdf2txt Name7 "https://hpvchemicals.oecd.org/UI/handler.axd?id=263f3491-90c7-4c3a-b43e-4c4e9395bcea"
call URLpdf2txt Name8 "https://hpvchemicals.oecd.org/UI/handler.axd?id=b78d39a9-26c2-48ff-aadc-cb056a89f08b"
call URLpdf2txt Name9 "https://hpvchemicals.oecd.org/UI/handler.axd?id=97a7b56f-ebaf-4416-8b4b-88b19ca3bd16"
call URLpdf2txt Name10 "https://hpvchemicals.oecd.org/UI/handler.axd?id=c6c3b7c1-9239-40d9-b51a-85a15e2411d6"
在注释中,您提到pdftotext不是作为本机命令安装的,所以第一步是确保本地副本,这样就可以像指定的URLpdf2txt
那样使用辅助脚本了。
URLpdf2txt.cmd
@echo off
if not exist xpdf-tools-win-4.04/bin32/pdftotext.exe curl -o %temp%\xpdftools.zip https://dl.xpdfreader.com/xpdf-tools-win-4.04.zip && tar -m -xf %temp%\xpdftools.zip xpdf-tools-win-4.04/bin32/pdftotext.exe
curl -o "%~dpn1.pdf" "%~2"
"xpdf-tools-win-4.04/bin32/pdftotext.exe" -nopgbrk -layout -enc UTF-8 "%~dpn1.pdf" "%~dpn1.txt"
https://stackoverflow.com/questions/73434525
复制相似问题