首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

作者头像
全栈程序员站长
发布2022-09-10 09:10:16
发布2022-09-10 09:10:16
47900
代码可运行
举报
运行总次数:0
代码可运行

大家好,又见面了,我是你们的朋友全栈君。

PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

参考资料:https://github.com/mstamy2/PyPDF2/issues/438

使用 PyPDF2 做合并 PDF 文件时报错如下:

代码语言:javascript
代码运行次数:0
运行
复制
Traceback (most recent call last):
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 484, in readFromStream
    return NameObject(name.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\projects\myproject\apps\backstage\views\busi_contract_manage_view.py", line 703, in post
    merge_pdf_result = merge_pdf(final_files, pdf_path)
  File "D:\projects\myproject\apps\utils\doc_convert_util.py", line 86, in merge_pdf
    pdf_writer.write(new_file)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 579, in readFromStream
    value = readObject(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 60, in readObject
    return NameObject.readFromStream(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 492, in readFromStream
    raise utils.PdfReadError("Illegal character in Name Object")
PyPDF2.utils.PdfReadError: Illegal character in Name Object

找到对应的报错文件

代码语言:javascript
代码运行次数:0
运行
复制
File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 484

第484行 原代码:

代码语言:javascript
代码运行次数:0
运行
复制
try:
    return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
    # Name objects should represent irregular characters
    # with a '#' followed by the symbol's hex number
    if not pdf.strict:
        warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
        return NameObject(name)
    else:
        raise utils.PdfReadError("Illegal character in Name Object")

在 except 中加入代码

代码语言:javascript
代码运行次数:0
运行
复制
return NameObject(name.decode('gbk'))

修改后

代码语言:javascript
代码运行次数:0
运行
复制
try:
    return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
    try:
        return NameObject(name.decode('gbk'))
    except (UnicodeEncodeError, UnicodeDecodeError) as e:
        # Name objects should represent irregular characters
        # with a '#' followed by the symbol's hex number
        if not pdf.strict:
            warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
            return NameObject(name)
        else:
            raise utils.PdfReadError("Illegal character in Name Object")

修改后仍会报错,需要修改修改另一处

Lib/site-packages/PyPDF2/utils.py 第238行

原代码

代码语言:javascript
代码运行次数:0
运行
复制
r = s.encode('latin-1')
if len(s) < 2:
    bc[s] = r
return r

修改后代码:

代码语言:javascript
代码运行次数:0
运行
复制
try:
    r = s.encode('latin-1')
except Exception as e:
    r = s.encode('utf-8')
if len(s) < 2:
    bc[s] = r
return r

发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/152402.html原文链接:https://javaforall.cn

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档