文章/答案/技术大牛

发布

社区首页 >问答首页 >使用PDF Clown实际裁剪PDF

问使用PDF Clown实际裁剪PDF
EN

Stack Overflow用户

提问于 2016-06-06 17:07:42

回答 1查看 316关注 0票数 2

我的目标实际上是用PdfClown裁剪一个PDF文件。有很多工具/库，允许裁剪PDF，改变PDF cropBox。这允许在矩形区域之外隐藏内容，但内容仍然在那里，它可以通过PDF解析器访问，并且PDF大小不会改变。

相反，我需要的是创建一个只包含矩形区域内内容的新页面。

到目前为止，我已经尝试了扫描内容并有选择地克隆它们。但是我还没有成功。对此使用PdfClown有什么建议吗？

我已经看到有人在用PdfBox Cropping a region from a PDF page with PDFBox尝试类似的东西，但还没有成功。

pdf

pdfclown

回答 1

Stack Overflow用户

发布于 2020-07-28 15:07:30

有点晚了，但它可能对某些人有帮助；我正在成功地完成您所要求的-但使用的是其他库。所需的库: iText 4或5和Ghostscript

使用伪代码的步骤1

使用iText创建一个PDFWRITER实例，其中包含一个空白文档。打开要裁剪的原始文件的PDFREADER对象。导入页面，从源获取PDFTemplate对象，将其.boundingBox属性设置为所需的裁剪框，将模板包装到iText图像对象中，并将其粘贴到新页面的绝对位置。

Dim reader As New PdfReader(sourcefile)
Dim doc As New Document()
Dim writer As PdfWriter = PdfWriter.GetInstance(doc, New System.IO.FileStream(outputfilename, System.IO.FileMode.Create))

//get the source page as an Imported Page
Dim page As PdfImportedPage = writer.GetImportedPage(reader, indexOfPageToGet) page

//create PDFTemplate Object at original size from source - see iText in Action book Page 91 for full details
Dim pdftemp As PdfTemplate = page.CreateTemplate(page.Width, page.Height) 
//paste the original page onto the template object, see iText documentation what those parameters do (scaling, mirroring)
pdftemp.AddTemplate(page, 1, 0, 0, 1, 0, 0)
//now the critical part - set .boundingBox property on the template. This makes all objects outside the rectangle invisible
pdftemp.boundingBox = {iText Rectangle Structure with new Cropbox}
//template not needed anymore
writer.ReleaseTemplate(pdftemp) 
//create an iText IMAGE object as wrapper to the template - with this img object absolute positionion on the final page is much easier
dim img as iTextSharp.Text.Image = Image.GetInstance(pdftemp)
// set img position
img.SetAbsolutePosition(x, y)
//set optional Rotation if needed
img.RotationDegrees = 0
//finally, this adds the actual content to the new document
doc.Add(img) 
//cleanup
doc.Close()
reader.Close()
writer.Close()

输出文件在视觉上看起来像是裁剪过的。但是这些对象仍然存在于PDF流中。文件大小可能会保持很小的变化。

第2步：

使用Ghostscript和输出设备pdfwrite，结合正确的命令行参数，您可以重新处理步骤1中的PDF。这将为您提供一个更小的PDF。参见Ghostscript文档中的参数https://www.ghostscript.com/doc/9.52/Use.htm这个步骤实际上删除了边界框之外的对象--这是你在操作中要求的，至少对于我处理的文件是这样的。

可选步骤3:将MUTOOL与-g选项配合使用，可以清理未使用的外部参照对象。原始PDF可能有很多外部参照，这会增加文件大小。在裁剪后，其中一些可能不再需要。https://mupdf.com/docs/manual-mutool-clean.html

PDF格式是一个棘手的事情，通常我会同意@Tilman Hausherr，我的建议可能不适用于所有的文件，并涵盖了‘几乎不可能’的情况，但它适用于我处理的所有情况。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/37653475

复制

相似问题

问使用PDF Clown实际裁剪PDF
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用PDF Clown实际裁剪PDFEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用PDF Clown实际裁剪PDF
EN