Python:从ISO-8859-1/latin1转换为UTF-8

在Python中，如果你需要将字符串从ISO-8859-1（Latin-1）编码转换为UTF-8编码，可以使用以下方法：

方法1：直接编码转换（适用于字符串对象）

# 假设原始字符串是ISO-8859-1编码的
iso_string = "Café au lait"  # 这里假设是ISO-8859-1编码

# 将ISO-8859-1字符串解码为Unicode，再编码为UTF-8
utf8_string = iso_string.encode('iso-8859-1').decode('iso-8859-1').encode('utf-8')

# 注意：上面的代码实际上不会改变字符串内容，因为ISO-8859-1和Unicode的映射是直接对应的
# 更简单的方法是直接编码为UTF-8（如果已经是Unicode字符串）
utf8_bytes = iso_string.encode('utf-8')

print(utf8_bytes)  # 输出的是bytes对象

不过，更常见的情况是：

方法2：处理文件或字节数据

如果你是从文件或其他字节流中读取ISO-8859-1编码的数据：

# 从ISO-8859-1编码的字节数据转换为UTF-8
iso_bytes = b'Caf\xe9 au lait'  # ISO-8859-1编码的字节

# 先解码为Unicode，再编码为UTF-8
unicode_str = iso_bytes.decode('iso-8859-1')
utf8_bytes = unicode_str.encode('utf-8')

print(utf8_bytes)  # b'Caf\xc3\xa9 au lait'

方法3：直接转换（如果已经是Unicode字符串）

如果你的字符串已经是Unicode字符串（Python 3中的str类型），你可以直接编码为UTF-8：

# 假设这是Unicode字符串
unicode_str = "Café au lait"

# 直接编码为UTF-8
utf8_bytes = unicode_str.encode('utf-8')

print(utf8_bytes)  # b'Caf\xc3\xa9 au lait'

完整示例（文件操作）

如果你需要处理文件：

# 读取ISO-8859-1编码的文件并转换为UTF-8
with open('input.txt', 'r', encoding='iso-8859-1') as f:
    content = f.read()  # 这已经是Unicode字符串

# 写入UTF-8编码的文件
with open('output.txt', 'w', encoding='utf-8') as f:
    f.write(content)

注意事项

在Python 3中，字符串默认是Unicode（str类型），字节是bytes类型
ISO-8859-1（Latin-1）是一种单字节编码，它与Unicode的前256个码点直接对应
如果你不确定原始编码，可能需要先尝试其他编码（如cp1252）

常见错误处理

如果遇到编码错误，可以尝试：

# 忽略无法解码的字符
content = open('input.txt', 'rb').read().decode('iso-8859-1', errors='ignore')

# 或替换无法解码的字符
content = open('input.txt', 'rb').read().decode('iso-8859-1', errors='replace')