文章/答案/技术大牛

发布

社区首页 >问答首页 >Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面中

问Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面中
EN

Stack Overflow用户

提问于 2019-02-26 23:21:20

回答 2查看 3.1K关注 0票数 1

假设您有一个包含各种复杂元素的pdf页面。目标是裁剪页面的一个区域(只提取其中一个元素)，然后将其粘贴到另一个pdf页面中。

以下是我的代码的简化版本：

import PyPDF2
import PyPdf

def extract_tree(in_file, out_file):
    with open(in_file, 'rb') as infp:
        # Read the document that contains the tree (in its first page)
        reader = pyPdf.PdfFileReader(infp)
        page = reader.getPage(0)

        # Crop the tree. Coordinates below are only referential
        page.cropBox.lowerLeft = [100,200]
        page.cropBox.upperRight = [250,300]

        # Create an empty document and add a single page containing only the cropped page
        writer = pyPdf.PdfFileWriter()
        writer.addPage(page)
        with open(out_file, 'wb') as outfp:
            writer.write(outfp)

def insert_tree_into_page(tree_document, text_document):
    # Load the first page of the document containing 'text text text text...'
    text_page = PyPDF2.PdfFileReader(file(text_document,'rb')).getPage(0)

    # Load the previously cropped tree (cropped using 'extract_tree')
    tree_page = PyPDF2.PdfFileReader(file(tree_document,'rb')).getPage(0)

    # Overlay the text-page and the tree-crop   
    text_page.mergeScaledTranslatedPage(page2=tree_page,scale='1.0',tx='100',ty='200')

    # Save the result into a new empty document
    output = PyPDF2.PdfFileWriter()
    output.addPage(text_page)
    outputStream = file('merged_document.pdf','wb')
    output.write(outputStream)



# First, crop the tree and save it into cropped_document.pdf
extract_tree('document1.pdf', 'cropped_document.pdf')

# Now merge document2.pdf with cropped_document.pdf
insert_tree_into_page('cropped_document.pdf', 'document2.pdf')

"extract_tree“方法似乎起作用了。它生成一个只包含裁剪区域(在本例中为树)的pdf文件。问题在于，当我尝试将树粘贴到新页面时，无论如何，都会粘贴原始图像的星形和房屋

pdf-generation

pypdf2

pypdf

python

merge

回答 2

Stack Overflow用户

发布于 2019-08-17 02:31:22

我尝试了一些实际有效的方法。尝试将第一个输出(只包含树的pdf )转换为docx，然后在将其与其他pdf页面合并之前，再次将其从docx转换为pdf。它将工作(只有树将被合并)。

请允许我问一下，您是如何实现定义裁剪Au边界的接口的。

票数 0

Stack Overflow用户

发布于 2021-02-22 17:33:59

我也有同样的问题。最后，我的解决方案是对pyPDF2的源代码做一个小的编辑(来自this pull request，它从未进入主分支)。您需要做的是将这些行插入到文件pdf.py内的类PageObject的方法_mergePage中

page2Content = ContentStream(page2Content, self.pdf)
page2Content.operations.insert(0, [map(FloatObject, [page2.trimBox.getLowerLeft_x(), page2.trimBox.getLowerLeft_y(), page2.trimBox.getWidth(), page2.trimBox.getHeight()]), "re"])
page2Content.operations.insert(1, [[], "W"])
page2Content.operations.insert(2, [[], "n"])

(有关放置它们的确切位置，请参阅拉取请求)。完成后，您可以裁剪pdf的部分，并将其与另一个页面合并，没有任何问题。没有必要将裁剪后的部分保存到单独的pdf中，除非你想这样做。

from PyPDF2 import PdfFileReader, PdfFileWriter

tree_page = PdfFileReader(open('document1.pdf','rb')).getPage(0)
text_page = PdfFileReader(open('document2.pdf','rb')).getPage(0)

tree_page.cropBox.lowerLeft = [100,200]
tree_page.cropBox.upperRight = [250, 300]

text_page.mergeScaledTranslatedPage(page2=tree_page, scale='1.0', tx='100', ty='200')
output = PdfFileWriter()
output.addPage(text_page)
output.write(open('merged_document.pdf', 'wb'))

也许有一种更好的方法可以插入代码而不直接编辑源代码。如果有人能找到这样做的方法，我将不胜感激，因为无可否认，这是一个有点不可靠的技巧。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54888798

复制

相似问题

问Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面中
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面中EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面中
EN