问用Python和pyPDF提取前两行PDF
EN

Stack Overflow用户

提问于 2016-09-29 04:46:58

回答 1查看 2.3K关注 0票数 1

我使用python2.7和pyPDF从PDF文件中获取标题元信息。不幸的是，并非所有PDF都有元信息。我现在要做的是从PDF中抓取前两行文字。如何使用我现在拥有的代码来用pyPDF捕获前两行呢？

from pyPdf import PdfFileWriter, PdfFileReader
import os

for fileName in os.listdir('.'):
    try:
        if fileName.lower()[-3:] != "pdf": continue
        input1 = PdfFileReader(file(fileName, "rb"))

        # print the title of document1.pdf
        print fileName, input1.getDocumentInfo().title
    except:
        print ",",

python

python-2.7

pypdf

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-09-29 04:53:50

from PyPDF2 import PdfFileWriter, PdfFileReader
import os
import StringIO

fileName = "HMM.pdf"
try:
        if fileName.lower()[-3:] == "pdf": 
            input1 = PdfFileReader(file(fileName, "rb"))

            # print the title of document1.pdf
            #print fileName, input1.getDocumentInfo().title

            content = input1.getPage(0).extractText()
            buf = StringIO.StringIO(content)
            buf.readline()
            buf.readline()

except:
        print ",",

我的pwd包含这个"HMM.pdf“文件，这段代码正在正确地处理python2.7。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/39761609

复制

相似问题

问用Python和pyPDF提取前两行PDF
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python和pyPDF提取前两行PDFEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python和pyPDF提取前两行PDF
EN