用python来通过PPT开卷考试

赵云龙龙

发布于 2021-11-16 14:41:07

1.1K1

发布于 2021-11-16 14:41:07

最近要通过试用期转正考试。考试是开卷的，内容基本都在历年的PPT上，包括公司规章制度，各种流程，各个部门的技术文档，好多好多PPT。要翻到答案，都得费好大的劲。于是我想到用python来实现，我的想法是，先把所有的PPT都遍历到一个excel里面，每一个ppt包含什么内容，在哪一页，这样我找起来方便。

说干就干。

安装：

pip install python-pptx

先了解下PPT基本结构在python分别是什么含义：

Slide：幻灯片，就是演示文稿中每一页的页面。

Shape：方框，在每页幻灯片内插入的方框，可以是形状，也可以是文本框。

Run：文字块，一般为较少字符。

Paragraph：段落，通常有序号ㆍ、1.等。

1.创建pptx文档类并插入一页幻灯片

from pptx import Presentation
prs = Presentation()
slide = prs.slides.add_slide(prs.slide_layouts[1])
# 对ppt的修改
prs.save('[python](http://www.cppcns.com/jiaoben/python/ "python")-pptx.pptx')

prs.slide_layouts中一共预存有1-48种，采用第六种为空白幻灯片

例slide_layouts[1]为带标题和正文框的ppt，slide_layouts[6]为空白页ppt

slide 及为一页‘幻灯片类'

修改完后 prs.save('name.pptx') 保存ppt

2.在创建的这页幻灯片文本框中添加文字

body_shape = slide.shapes.placeholders # body_shape为本页ppt中所有shapes
body_shape[0].text = 'this is placeholders[0]' # 在第一个文本框中文字框架内添加文字
body_shape[1].text = 'this is placeholders[1]' # 在第二个文本框中文字框架内添加文字

在ppt中所有的元素均被当成一个shape，slide.shapes表示幻灯片类中的模型类，placeholders中为每个模型，采用slide_layouts[1]中包含两个文本框，所以print len(slide.shapes.placeholders) 话为 2。

或

title_shape = slide.shapes.title # 取本页ppt的title
title_shape.text = 'this is a title' # 向title文本框写如文字
subtitle = slide.shapes.placeholders[1] # 取出本页第二个文本框
subtitle.text = 'this is a subtitle' # 在第二个文本框中写入文字

由于采用的slide_layouts[1]包含一个标题和一个正文框，所以可以直接取slide.shapes.title 表示标题框写入文字亦可

3.在文本框中添加新段落

from pptx.util import Pt
new_paragraph = body_shape[1].text_frame.add_paragraph() # 在第二个shape中的文本框架中添加新段落
new_paragraph.text = 'add_paragraph' # 新段落中文字
new_paragraph.font.bold = True # 文字加粗
new_paragraph.font.italic = True # 文字斜体
new_paragraph.font.size = Pt(15) # 文字大小
new_paragraph.font.underline = True # 文字下划线
new_paragraph.level = 1 # 新段落的级别

add_paragraph中的文字支持修改font pptx.util 中为Pt为文字大小设置 4.添加新文本框

left = top = width = height = Inches(5) # 预设位置及大小
textbox = slide.shapes.add_textbox(left, top, width, height) # left，top为相对位置，width，height为文本框大小
textbox.text = 'this is a new textbox' # 文本框中文字
new_para = textbox.text_frame.add_paragraph() # 在新文本框中添加段落
new_para.text = 'this is second para in textbox' # 段落文字

5.添加图片


img_path = 'img_path.jpg' # 文件路径
left, top, width, height = Inches(1), Inches(4.5), Inches(2), Inches(2) # 预设位置及大小
pic = slide.shapes.add_picture(img_path, left, top, width, height) # 在指定位置按预设值添加图片

6.添加形状

from pptx.enum.shapes import MSO_SHAPE
left, top, width, height = Inches(1), Inches(3), Inches(1.8), Inches(1) # 预设位置及大小
shape = slide.shapes.add_shape(MSO_SHAPE.PENTAGON, left, top, width, height) # 在指定位置按预设值添加类型为PENTAGON的形状
shape.text = 'Step 1'
for n in range(2, 6):
left = left + width - Inches(0.3)
shape = slide.shapes.add_shape(MSO_SHAPE.CHEVRON, left, top, width, height)
shape.text = 'Step{}'.format(n)

MSO_SHAPE中有office中各类型形状，详见：https://msdn.microsoft.com/en-us/library/office/ff862770(v=office.15).aspx

7.添加表格

rows, cols, left, top, width, height = 2, 2, Inches(3.5), Inches(4.5), Inches(6), Inches(0.8)
table = slide.shapes.add_table(rows, cols, left, top, width, height).table # 添加表格，并取表格类
table.columns[0].width = Inches(2.0) # 第一纵列宽度
table.columns[1].width = Inches(4.0) # 第二纵列宽度
table.cell(0, 0).text = 'text00' # 指定位置写入文本
table.cell(0, 1).text = 'text01'
table.cell(1, 0).text = 'text10'
table.cell(1, 1).text = 'text11'

了解完这些，我就开始动手了，用pandas写到excel, 需要啥，去excel里查，如果查到更好。查不到，查到位置，可以去PPT里面查看。

如此，便写出了代码：

from pptx import Presentation
import os
import pandas as pd

path_to_presentation=r"C:\Users\Anderson.xie\Downloads\Courses"
path_to_excel=r"C:\Users\Anderson.xie\Downloads"


text_runs = []

def get_ppt_text(ppt_path):
    prs = Presentation(ppt_path)
    each_page=[]
    for slide in prs.slides:
        s=""
        for shape in slide.shapes:
            shapes=""
            if not shape.has_text_frame:
                continue

            for paragraph in shape.text_frame.paragraphs:
                pa=""
                for run in paragraph.runs:
                    pa+=run.text+" "
                shapes+=pa
            s+=shapes
        print(s)
        each_page.append(s)
    each_page.insert(0,ppt_path)
    text_runs.append(each_page)



def traverse_folder(files):
    if os.path.isfile(files) and files.endswith(".pptx"):
        get_ppt_text(files)

    # ppt_list=[]
    # if os.path.isdir(root_path):
    #     ppt_list= [x for x in os.listdir(root_path) if x.endswith(".pptx")]
    #     print(ppt_list)
    #
    # if ppt_list !=[]:
    #     for x in ppt_list:
    #         get_ppt_text(os.path.join(root_path,x))


def listdir(path, list_name):  # 传入存储的list
    for file in os.listdir(path):
        file_path = os.path.join(path, file)
        if os.path.isdir(file_path):
            listdir(file_path, list_name)
        else:
            list_name.append(file_path)




if __name__ == "__main__":
    list_path=[]
    listdir(path_to_presentation,list_path)
    print(list_path)
    print(len(list_path))
    for i in list_path:
        traverse_folder(i)
    print(text_runs)
    if text_runs != []:
        print(text_runs)
        df = pd.DataFrame(text_runs)
        print(df)
        df.to_excel(os.path.join(path_to_excel,"result.xlsx"))

    else:
        print("no ppt file was found!")

考试前，我得到了一个满是答案的excel, 感觉自己要打满分了。结果考试的时候傻眼了，别人给的PPT的位置就不对，还是得手工去翻。不过以后去PPT里面找东西，还是很方便的。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-11-11，如有侵权请联系 cloudcommunity@tencent.com 删除

bash