如何在python中将html转换为word docx？

在Python中将HTML转换为Word Docx可以使用python-docx库。以下是一个完整的示例代码：

from docx import Document
from docx.shared import Inches
from bs4 import BeautifulSoup

def html_to_docx(html_file, docx_file):
    # 读取HTML文件
    with open(html_file, 'r', encoding='utf-8') as f:
        html_content = f.read()

    # 创建Word文档
    doc = Document()

    # 解析HTML内容
    soup = BeautifulSoup(html_content, 'html.parser')

    # 提取所有的段落和标题
    paragraphs = soup.find_all(['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6'])

    # 将HTML内容转换为Word文档
    for p in paragraphs:
        text = p.get_text()
        style = p.name

        if style.startswith('h'):
            # 添加标题
            level = int(style[1])
            doc.add_heading(text, level=level)
        else:
            # 添加段落
            doc.add_paragraph(text)

    # 保存Word文档
    doc.save(docx_file)

# 示例用法
html_to_docx('input.html', 'output.docx')

上述代码使用了python-docx库来创建和操作Word文档，使用BeautifulSoup库来解析HTML内容。首先，读取HTML文件的内容，然后创建一个空的Word文档。接下来，使用BeautifulSoup解析HTML内容，并提取所有的段落和标题。根据标签名和样式，将HTML内容转换为Word文档中的段落和标题。最后，保存生成的Word文档。

请注意，这只是一个简单的示例代码，对于复杂的HTML结构和样式可能需要进行适当的调整。另外，为了运行上述代码，你需要安装python-docx和BeautifulSoup库。你可以使用以下命令来安装这些库：