我使用python2.7和pyPDF从PDF文件中获取标题元信息。不幸的是,并非所有PDF都有元信息。我现在要做的是从PDF中抓取前两行文字。如何使用我现在拥有的代码来用pyPDF捕获前两行呢?
from pyPdf import PdfFileWriter, PdfFileReader
import os
for fileName in os.listdir('.'):
try:
if fileName.lower()[-3:] != "pdf": continue
input1 = PdfFileReader(file(fileName, "rb"))
# print the title of document1.pdf
print fileName, input1.getDocumentInfo().title
except:
print ",", 发布于 2016-09-29 04:53:50
from PyPDF2 import PdfFileWriter, PdfFileReader
import os
import StringIO
fileName = "HMM.pdf"
try:
if fileName.lower()[-3:] == "pdf":
input1 = PdfFileReader(file(fileName, "rb"))
# print the title of document1.pdf
#print fileName, input1.getDocumentInfo().title
content = input1.getPage(0).extractText()
buf = StringIO.StringIO(content)
buf.readline()
buf.readline()
except:
print ",", 我的pwd包含这个"HMM.pdf“文件,这段代码正在正确地处理python2.7。
https://stackoverflow.com/questions/39761609
复制相似问题