文章/答案/技术大牛

发布

社区首页 >问答首页 >为什么PDFBox返回大小为0x0的图像维数

问为什么PDFBox返回大小为0x0的图像维数
EN

Stack Overflow用户

提问于 2018-05-29 05:03:07

回答 1查看 320关注 0票数 1

为了在PDF上找到图像的实际大小，我使用了PDFBox，并遵循了这就是答案中描述的内容。所以基本上我打电话给

 // Computes the image actual location and dimensions
 PrintImageLocations renderer = new PrintImageLocations();

 for (int i = 0; i < pageLimit; ++i) {
        PDPage page = pdf.getPage(i);

        renderer.processPage(page);
 }

PrintImageLocations()是从这个PDFBox代码示例中提取的。

然而，使用我用于测试的PDF文档(由Ghostscript 910 (ps2write)从发现的图像生成的维基百科)，报告的图像大小为0 x 0(尽管PDF可以导入到Gimp或Libre Office绘图中)。

因此，我想知道我目前使用的代码是可靠的还是找不到图像大小，是什么使它找不到正确的图像大小？

用于此测试可以在这里找到的PDF

==========

编辑:在@Itai注释之后，似乎没有得到评估，因为没有调用这样的操作。因此，调用来自超类的processOperator。

调用的唯一操作是(我在overriden processOperator方法中的条件之前添加了processOperator)：

加工Q加工cm加工gs加工Q加工W加工n加工rg加工再加工cs加工scn加工f加工f处理Q处理Q

==========

任何暗示都很感激，

java

pdfbox

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-05-29 15:41:10

正如您自己已经发现的，0x0输出的原因是PrintImageLocations as-is的代码根本找不到图像。

PrintImageLocations找不到图像，因为它只查找页面内容中的图像用途和页面内容中使用的表单XObjects (也是嵌套的)。另一方面，在当前的文件中，图像被绘制在一个平铺的模式内容中，该内容用于填充页面内容中的一个区域。

为了允许PDFBox找到这个映像，我们必须对PrintImageLocations类进行一些扩展，以便也深入到模式内容流中，例如：

class PrintImageLocationsImproved extends PrintImageLocations {
    public PrintImageLocationsImproved() throws IOException {
        super();

        addOperator(new SetNonStrokingColor());
        addOperator(new SetNonStrokingColorN());
        addOperator(new SetNonStrokingDeviceCMYKColor());
        addOperator(new SetNonStrokingDeviceGrayColor());
        addOperator(new SetNonStrokingDeviceRGBColor());
        addOperator(new SetNonStrokingColorSpace());
    }

    @Override
    protected void processOperator(Operator operator, List<COSBase> operands) throws IOException {
        String operation = operator.getName();
        if (fillOperations.contains(operation)) {
            PDColor color = getGraphicsState().getNonStrokingColor();
            PDAbstractPattern pattern = getResources().getPattern(color.getPatternName());
            if (pattern instanceof PDTilingPattern) {
                processTilingPattern((PDTilingPattern) pattern, null, null);
            }
        }
        super.processOperator(operator, operands);
    }

    final List<String> fillOperations = Arrays.asList("f", "F", "f*", "b", "b*", "B", "B*");
}

(https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/extract/ExtractImageLocations.java#L132内部类PrintImageLocationsImproved__)

手边文档中的贴图图案用作填充的图案颜色，而不是抚摸。因此，PrintImageLocationsImproved必须注册操作符侦听器，以便在图形状态中正确更新填充颜色。

现在，在委托给PrintImageLocations实现之前，processOperator首先检查操作符是否是fill操作。在这种情况下，它会检查当前的填充颜色。如果是模式颜色，processOperator启动在PDFStreamEngine中定义的processTilingPattern处理，这将启动对模式内容流的嵌套分析，从而最终让PrintImageLocationsImproved找到图像。

像这样使用PrintImageLocationsImproved

try (   PDDocument document = PDDocument.load(...)    )
{
    PrintImageLocations printer = new PrintImageLocationsImproved();
    int pageNum = 0;
    for( PDPage page : document.getPages() )
    {
        pageNum++;
        System.out.println( "Processing page: " + pageNum );
        printer.processPage(page);
    }
}

(https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/extract/ExtractImageLocations.java#L95 test testExtractLikeHelloWorldImprovedFromTopSecret__)

因此，对于PDF文件，将找到图像：

Processing page: 1
*******************************************************************
Found image [R8]
position in PDF = 39.0, 102.48 in user space units
raw image size  = 1209, 1640 in pixels
displayed size  = 516.3119, 700.3752 in user space units
displayed size  = 7.1709986, 9.727433 in inches at 72 dpi rendering
displayed size  = 182.14336, 247.0768 in millimeters at 72 dpi rendering

小心点，

这不是完美的修复，更多的是概念和工作的证明，因为它既不适当地将模式限制在实际填充的区域，也不返回一个足够大到需要多个模式块填充的区域的多个发现。尽管如此，它还是会返回手边文件的图像匹配。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50576614

复制

相似问题

问为什么PDFBox返回大小为0x0的图像维数
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么PDFBox返回大小为0x0的图像维数EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么PDFBox返回大小为0x0的图像维数
EN