文章/答案/技术大牛

发布

问FlateDecode PDF解码
EN

Stack Overflow用户

提问于 2016-11-23 10:37:14

回答 1查看 7.1K关注 0票数 1

我有一个来自iTextSharp的结果，它可以通过pdf阅读器进行解析，但我希望能够获取二进制内容并手动解析它。我尝试在标记<</Length 256/Filter/FlateDecode>>stream和endstream之间获取文本，并使用.NET DeflateStream类尝试解压缩文本，这导致了这个异常：

System.IO.InvalidDataException: Block length does not match with its complement. at System.IO.Compression.Inflater.DecodeUncompressedBlock(Boolean& end_of_block) at System.IO.Compression.Inflater.Decode() at System.IO.Compression.Inflater.Inflate(Byte[] bytes, Int32 offset, Int32 length) at System.IO.Compression.DeflateStream.Read(Byte[] array, Int32 offset, Int32 count) at System.IO.Stream.InternalCopyTo(Stream destination, Int32 bufferSize) at FlateDecodeTest.Decompress(Byte[] data)

我的代码是：

using System;
using System.Security.Cryptography;
using System.Text;
using System.Diagnostics;
using System.IO;
using System.IO.Compression;

public class FlateDecodeTest
{
    public static void Main() 
    {
        string s = @"xœuÁN!E÷|Å...";

        byte[] b = Decompress(GetBytes(s));

        Console.WriteLine(GetString(b));
    }

    public static byte[] Decompress(byte[] data)
    {
        Console.WriteLine(data.Length);
        byte[] decompressedArray = null;
        try
        {
            using (MemoryStream decompressedStream = new MemoryStream())
            {
                using (MemoryStream compressStream = new MemoryStream(data))
                {
                    using (DeflateStream deflateStream = new DeflateStream(compressStream, CompressionMode.Decompress))
                    {
                        deflateStream.CopyTo(decompressedStream);
                    }
                }
                decompressedArray = decompressedStream.ToArray();
            }
        }
        catch (Exception exception)
        {
            Console.WriteLine(exception);
        }

        return decompressedArray;
    }

    static byte[] GetBytes(string str)
    {
        byte[] bytes = new byte[str.Length * sizeof(char)];
        System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
        return bytes;
    }

    static string GetString(byte[] bytes)
    {
        char[] chars = new char[bytes.Length / sizeof(char)];
        System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
        return new string(chars);
    }
}

itext

.net

pdf

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-11-23 10:53:49

不要使用DeflateStream类。如果您对页面的内容流感兴趣(例如第1页)，可以使用以下方法：

byte[] streamBytes = reader.GetPageContent(1);

其中reader是PdfReader类的一个实例。当然，如果页面的资源字典中有表单XObjects，这是不够的。在这种情况下，您必须使用PRStream对象。例如:如果表单XObject (或任何其他流对象)具有对象号23，则可以得到如下所示的PRStream对象：

PRStream str = (PRStream)reader.GetPdfObject(23);
byte[] bytes = PdfReader.GetStreamBytes(str);

与提供原始的压缩字节的GetStreamBytesRaw()方法相反，GetStreamBytes()方法将解压流。请参阅iTextSharp: Convert PdfObject to PdfStream

如果您不知道要检查的对象的数量，可以遍历PDF对象树，例如使用PdfDictionary的PdfDictionary方法、PdfArray等。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/40762055

复制

相似问题

问FlateDecode PDF解码
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问FlateDecode PDF解码EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问FlateDecode PDF解码
EN