最全总结 | 聊聊 Python 办公自动化之 PPT（下）

AirPython

发布于 2020-12-02 10:50:28

1.5K0

发布于 2020-12-02 10:50:28

文章被收录于专栏：Python 自动化Python 自动化

点击上方 “AirPython”，选择 “加为星标”

第一时间关注 Python 技术干货！

1. 前言

作为办公自动化 PPT 系列篇的最后一篇文章，我们将 PPT 中的高级功能及常用点

文章内容将覆盖：

预设形状 Shape
图表 Chart
读取文字内容
保存所有图片

2. 预设形状 Shape

实际上，PPT 文档的内容区就是由各类形状 Shape 组成，包含：图片、文本框、视频、表格、预设形状

其中，预设的普通形状也相当丰富，可以查看下面链接

使用下面的方法，可以向幻灯片中插入一个形状

slide.shapes.add_shape(autoshape_type_id, left, top, width, height)

参数分别是：

autoshape_type_id 形状类型
left 左边距
top 上边距
width 形状宽度
height 形状高度

我们以插入一个简单的圆角矩形框为例 2-1 插入形状 from pptx.enum.shapes import MSO_SHAPE, MSO_SHAPE_TYPE def insert_shape(slide, left, top, width, height, autoshape_type_id=MSO_SHAPE.CHEVRON, unit=Inches): """ 幻灯片中添加形状 :param unit: 单位，默认为Inches :param autoshape_type_id: 形状类型 :param slide:幻灯片 :param left:左边距 :param top:上边距 :param width:宽度 :param height:高度 :return: """ # 添加一个形状 # add_shape(self, autoshape_type_id, left, top, width, height) # 参数分别为：形状类型、左边距、上边距、宽度、高度 shape = slide.shapes.add_shape(autoshape_type_id=autoshape_type_id, left=unit(left), top=unit(top), width=unit(width), height=unit(height)) return shape # 1、添加一个圆角矩形 rectangle = insert_shape(slide, 2, 2, 16, 8, autoshape_type_id=MSO_SHAPE.ROUNDED_RECTANGLE, unit=Cm) 2-2 设置形状属性上面方法返回的形状对象，我们可以进一步设置它的背景颜色及边框属性比如：设置背景色为白色；边框颜色为红色，宽度为 0.5 厘米 # 2、设置形状属性 # 2.1 背景颜色 set_widget_bg(rectangle, bg_rgb_color=[255, 255, 255]) # 2.2 边框属性 set_widget_frame(rectangle, frame_rgb_color=[255, 0, 0],frame_width=0.5) 更多形状可以参考下面链接 https://python-pptx.readthedocs.io/en/latest/api/enum/MsoAutoShapeType.html

3. 图表 Chart

图表 Chart 是 PPT 中使用很频繁的一块内容，使用 python-pptx 可以创建各种类型的图表，包含：柱状图、饼图、折线图、散点图、3D 图等

创建图表的方式如下：

slide.shapes.add_shape(autoshape_type_id, left, top, width, height)

参数分别是：

autoshape_type_id 图表样式
left 左边距
top 上边距
width 图表显示宽度
height 图表显示高度

3-1 创建一个折线图

首先，创建一个图表数据对象 ChartData

from pptx.chart.data import ChartData

slide = add_slide(self.presentation, 6)

# 创建一个图表数据对象
chart_data = ChartData()

接着，准备图表数据

# 数据类别（x轴数据）
chart_data.categories = [2000, 2005, 2010, 2015, 2020]

# 每一年各维度的数据（3个纬度）
# 经济
chart_data.add_series("经济", [60, 65, 75, 90, 95])

# 环境
chart_data.add_series("环境", [95, 88, 84, 70, 54])

# 文化
chart_data.add_series("军事",[40, 65, 80, 95, 98])

最后，指定图表类型为折线图 XL_CHART_TYPE.LINE，按照图表数据绘制图表

如果需要绘制其他图表，可以参考下面链接：

https://python-pptx.readthedocs.io/en/latest/api/enum/XlChartType.html

def insert_chart(slide, left, top, width, height, data, unit=Inches, chart_type=XL_CHART_TYPE.COLUMN_CLUSTERED):
    """
    插入图表
    :param slide: 幻灯片
    :param left: 左边距
    :param top: 上边距
    :param width: 宽度
    :param height: 高度
    :param data: 图表数据
    :param unit: 数据单位，默认为：Inches
    :param chart_type: 图表类型，默认是：柱状图
    :return:
    """
    chart_result = slide.shapes.add_chart(chart_type=chart_type,
                                          x=unit(left), y=unit(top),
                                          cx=unit(width), cy=unit(height),
                                          chart_data=data)
    # 返回图表
    return chart_result.chart

# 添加图表
chart = insert_chart(slide, 4, 5, 20, 9, chart_data, unit=Cm, chart_type=XL_CHART_TYPE.LINE)

3-2 设置图表显示属性

以设置图表图例、图表是否显示平滑、设置图表文字样式为例

# 设置图表显示属性
# 显示图例
chart.has_legend = True

# 图例是否在绘图区之外显示
chart.legend.include_in_layout = False

# 设置图表是否显示平滑
chart.series[0].smooth = True
chart.series[1].smooth = True
chart.series[2].smooth = True

# 设置图表中文字的样式
set_font_style(chart.font, font_size=12, font_color=[255, 0, 0])

最后生成的折线图效果图如下：

4. 读取内容

PPT 文档的内容区由各种 Shape 组成，并且 shape.has_text_frame 可用于判断形状内部是否包含文本框

因此，只需要遍历所有形状，就可以获取 PPT 中所有的文本内容

def read_ppt_content(presentation):
    """
    读取PPT中所有的内容
    :param presentation:
    :return:
    """
    # 所有内容
    results = []

    # 遍历所有幻灯片，获取文本框中的值
    for slide in presentation.slides:
        for shape in slide.shapes:
            # 判断形状是否包含文本框
            if shape.has_text_frame:
                content = get_shape_content(shape)
                if content:
                    results.append(content)

    return results

presentation = Presentation("./raw.pptx")

# 1、普通形状内容的所有文本内容
contents = read_ppt_content(presentation)
print(contents)

但是，对于图表 Table 单元格中的文本数据，没法利用这种方式获取到

我们只能过滤出形状类型为 TABLE 的形状，遍历表中所有行及单元格，获取文本数据

def read_ppt_file_table(self):
    """
    读取PPT中的数据
    :return:
    """
    # 打开待读取的ppt
    presentation = Presentation("./raw.pptx")

    for slide in presentation.slides:
        # 遍历素有形状
        # 形状：有内容的形状、无内容的形状
        for shape in slide.shapes:
            # print('当前形状名称:', shape.shape_type)
            # 只取表格中的数据，按照行读取内容
            if shape.shape_type == MSO_SHAPE_TYPE.TABLE:
                # 获取表格行（shape.table.rows）
                for row in shape.table.rows:
                    # 某一行所有的单元格(row.cells)
                    for cell in row.cells:
                        # 单元格文本框中的内容(cell.text_frame.text)
                        print(cell.text_frame.text)

5. 保存图片

有时候，我们需要将 PPT 文档中的所有图片保存到本地

只需要下面 3 步即可完成

遍历幻灯片内容区所有形状
过滤出形状类型为 MSO_SHAPE_TYPE.PICTURE 的图片形状，获取图片形状的二进制字节流
将图片字节流写入到文件中

def save_ppt_images(presentation, output_path):
    """
     保存ppt中所有图片
    [Python批量导出PPT中的图片素材](https://www.pythonf.cn/read/49552)
    :param presentation:
    :param output_path 保存目录
    :return:
    """

    print('幻灯片数目:', len(presentation.slides))

    # 遍历所有幻灯片
    for index_slide, slide in enumerate(presentation.slides):
        # 遍历所有形状
        for index_shape, shape in enumerate(slide.shapes):
            # 形状包含：文字形状、图片、普通形状等

            # 过滤出图片形状
            if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
                # 获取图片二进制字符流
                image_data = shape.image.blob

                # image/jpeg、image/png等
                image_type_pre = shape.image.content_type

                # 图片后缀名
                image_suffix = image_type_pre.split('/')[1]

                # 创建image文件夹保存抽出图片
                if not os.path.exists(output_path):
                    os.makedirs(output_path)

                # 图片保存路径
                output_image_path = output_path + random_str(10) + "." + image_suffix

                print(output_image_path)

                # 写入到新的文件中
                with open(output_image_path, 'wb') as file:
                    file.write(image_data)