文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用camelot从pdf中提取表格？

问如何使用camelot从pdf中提取表格？
EN

Stack Overflow用户

提问于 2020-05-27 21:39:54

回答 2查看 953关注 0票数 1

我想在python3中使用camelot从pdf中提取所有的表格。

import camelot
# PDF file to extract tables from
file = "./pdf_file/ooo.pdf"
tables = camelot.read_pdf(file)
# number of tables extracted
print("Total tables extracted:", tables.n)
# print the first table as Pandas DataFrame
print(tables[0].df)
# export individually
tables[0].to_csv("./pdf_file/ooo.csv")

然后我只从pdf的第一页得到一张表。如何从pdf文件中提取整个表格？？

pdf

python-camelot

python

csv

回答 2

Stack Overflow用户

发布于 2020-05-29 16:40:31

tables = camelot.read_pdf(file, pages='1-end')

如果未指定pages参数，则Camelot仅分析第一页。有关更好的解释，请参阅official documentation。

票数 1

Stack Overflow用户

发布于 2021-07-07 16:39:27

为了使用camelot提取pdf表，您必须使用以下代码。你必须使用stream参数，因为它非常强大，可以检测几乎所有的pdf表。此外，如果您在提取过程中遇到问题，则必须添加row_tol和edge_tol parameters.For示例row_tol =0和edge_tol=500作为参数。

pdf_archive = camelot.read_pdf(file_path, pages="all", flavor="stream")

for page, pdf_table in enumerate(pdf_archive):           
    print(pdf_archive[page].df)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62044535

复制

相似问题

问如何使用camelot从pdf中提取表格？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用camelot从pdf中提取表格？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用camelot从pdf中提取表格？
EN