文章/答案/技术大牛

发布

社区首页 >问答首页 >使用Python统计PDF中的图像数量

问使用Python统计PDF中的图像数量
EN

Stack Overflow用户

提问于 2021-07-15 15:32:25

回答 1查看 55关注 0票数 0

我正在尝试使用Python计算PDF中的图像数量，并将结果写入csv文件。理想情况下，我希望返回一个csv，它显示文件的列和每页的列，以及每页中的图像数量。但是，显示文档中的文件名和图像总数的列就足够了。

我试过了：

import fitz
import io
from PIL import Image
import csv

with open(r'output.csv', 'x', newline='', encoding='utf-8') as csvfile:
    # Declaring the writer 
    propertyWriter = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
    # Writing the headers 
    propertyWriter.writerow(['file', 'results', 'error'])
    for file in pdfs:

        # open the file
        pdf_file = fitz.open(file)


        # printing number of images found in this page
        if image_list:
            results = len(image_list[0])
            error = ""
            #print(results)
            #results = str(f"+ Found a total of {len(image_list)} images in page {page_index}")

        else:
            error = str("! No images found on page", page_index)
        propertyWriter.writerow([file, results, error])

参考：https://www.geeksforgeeks.org/how-to-extract-images-from-pdf-in-python/然而，这个选项是声明在每个PDF中有9个图像，但事实并非如此。

然后我试着：

import fitz
import csv
with open(r'output.csv', 'x', newline='', encoding='utf-8') as csvfile:
    # Declaring the writer 
    propertyWriter = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
    # Writing the headers 
    propertyWriter.writerow(['file', 'results'])
    for file in pdfs[0:5]:
        for i in range(len(doc)):
            for img in doc.getPageImageList(i):
                xref = img[0]
                pix = fitz.Pixmap(doc, xref)
                results = str(pix)

    propertyWriter.writerow([file, results])

引用：Extract images from PDF without resampling, in python?，但这又是说，每个PDF中都有相同数量的图像，但事实并非如此。

python

python-3.x

pdf

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-07-15 17:32:03

我尝试了你提到的第一个引用(https://www.geeksforgeeks.org/how-to-extract-images-from-pdf-in-python/)，它工作得很好(该页面上的代码)。有什么问题吗？它计算来自PDF的每一页的图像，而你只需要将它加在一起就可以得到每个pdf？

如果你把这个放到for循环中，你应该能够达到你的目标？

import fitz
import io
from PIL import Image

file = "doctest.pdf"
pdf_file = fitz.open(file)
results = 0

for page_index in range(len(pdf_file)):
    image_list = pdf_file[page_index].getImageList()
    
    # printing number of images found in this page
    if image_list:
        results += len(image_list)

print("Total images in this PDF: ", results)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68389667

复制

相似问题

问使用Python统计PDF中的图像数量
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python统计PDF中的图像数量EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python统计PDF中的图像数量
EN