我正在尝试使用python读取draw.io绘图中的数据。
显然,这种格式是一种xml,带有"mxfile“编码的某些部分。
(也就是说,xml的一个部分被压缩,然后base64编码)。
这是官方的TFM:https://drawio-app.com/extracting-the-xml-from-mxfiles/
和他们的在线解码器工具:https://jgraph.github.io/drawio-tools/tools/convert.html
因此,我尝试使用标准python工具解码mxfile部分:
import base64
s="7VvbcuI4FPwaHpOybG55BHKZmc1kmSGb7KvAArTIFiuLEObr58jINxTATvA4IVSlKtaxLFvq1lGrbWpOz3u+EXg+/c5dwmq25T7XnMuabSOr3YR/KrJaR9pIByaCurpSEhjQXyS6UkcX1CVBpqLknEk6zwZH3PfJSGZiWAi+zFYbc5a96xxPiBEYjDAzo4/UlVPdC7uVxL8QOplGd0bNi/UZD0eVdU+CKXb5MhVyrmpOT3Au10fec48wNXjRuDx+XT2y21nz5tuP4H/8T/ev+7uHs3Vj10UuibsgiC9f3fSv2fj6y0P9v3/n/esfS+umM/x2pi+xnjBb6PHqExFwX/dYrqJhDJbUY9iHUnfMfTnQZ2AQupjRiQ/HI3g6IiDwRISkgEBHn5B8DtHRlDL3Fq/4QvUhkHg0i0rdKRf0FzSLGZxCEIDTQmoy2c1MjYG6EsIWRAUJoE4/GhgUh25xIHWdEWcMzwM6DB9YVfGwmFC/y6XkXtQQX/gucXUpRjosSMFnMXfU9Tnh0LCp0SDPKTJqeG4I94gUK6iiz8ZM01MNReVlQlzU1LFpmrROW08YPVkmcdvx7X7C5ML+BAYhuZ+zcb96zvvZzeztMAPgfSxJVw1jkKYhHKS6moRCchYgKjKIeoc9YtAURlqmKMnIWG4lZDDHI+pPbsM6l/Uk8lP3VIU4XDtmIRmm1HWJH5JFYonXfFIMmXPqy3AoGl34gwHrWeeNWgMeqAdllJThT1UXssd94BWmIYEIkHVJFGFfoNbOabufWqssYkWRTRMpA2lR/Gwz0Uy5r8h4t/CGkDaODckdGWUqPaYPy8K7YVeMt2PgfeVhqi7ruC7k6OAE+EEBb7UrBrxuAG4gzGioH/RooBfX1j3wewCkai7C+17R4fIMGZxwTE44L+DP8JCwPg+opFy1L9Z1N3hRVdZGVj0fqjuW/zeB2jCz9kKMpjhQiRtk1wyGNzw6wvlcGqio6tzcNFAdyIWruplT9Vsn1X841Y82VL/TLFf1ow3V77Tfr+pvbWfqserGnGmnmZtm72UH0Daw7MDTK/fGtr7DUnJ0SB5UEBbGu/IdwMVJEB4c1Lwqvyw9iEy/8CskfusK0AiXWtu656rsC65aO7IZndZA9bIwbledqJHptd0QteIOiEd9LBTg93hGTJP4o+NbFqTVS/7oAXZlY+K7HfXCBUpDxpXa7kJIy3FkrYvXlEUr1x69nF3+iDsh0dQhbMiXV0mgGwbgRMSUwmo74LAtJfshg/3FhOTYzamn3QnsS0AKwrCkT9n3Tju0eV8RN9HltpXV5bblZJtYd1JflX7RU7Sh9SgYDR3Mqje9v77gYxIE3JTrpx1m+TtMZ3PHl3eH2bL2kviFDaZTz7HBbL2PDSYybcsBZlhn3E+4tsWT9+NsLJHpUhroffadRnFY8+4fS9tqmC7lp1IsEWLvWrKgjUzfeqVkcTYaslsbz1K2ZDGNxm2vKU+CpXzB0rDaGTrk/hDGRjsWme2KpdH4QB/CmD7qQApCzJc3n0WxtHLT690oFtMb7VF5fJrzoA54cZwrt8Bt0y6FpC2P77O1ioGu/OMX27RMQdmrVdy2etw9AX5gwHN/GFMe4qah2oMxkUfoHFSNtfNKMXY4rE1D0wD50xsMxXFt5JRhZTkMtun9PQBE7jEu0OWh2Kw8E5v2398LOV8oe6Gj3lXeqnlwQjQ3oheV59ti1h+fh2NdzNyLfUFUvdWnx3av0xdhudfq0zgrKqVtjbp+oDe6fvH7nJgwdraJvK5fo76noS2un9HQ2eYbp412+HgckFKMQ9s0Dq3z8wj4hK6hGZdKBHvSzlBbcus1vItHs0nI3x5nXMB5nycGpHa77fw5IZpf+ieX+rFq8c/P8ht1Z29kVETMPwaXaZ7lxyrSTx8VrMPM/uib3D8OnemZMeiFWuDxVu8zJcc3UTVVcB4HP9bou7Eu5KK/kRgGAbZxJf86cXEYpjhZFz9K0m/hChSTH1yvqyc/W3eufgM="
result=zlib.decompress(base64.b64decode(s))
抛出异常:
zlib.error: Error -3 while decompressing data: incorrect header check
同时,当给定完全相同的数据时,上面的工具会很好地返回xml。
我遗漏了什么?
发布于 2021-11-30 21:56:49
试试这个:
import zlib
import base64
import xml.etree.ElementTree as ET
from urllib.parse import unquote
tree = ET.parse(filename)
data = base64.b64decode(tree.find('diagram').text)
xml = zlib.decompress(data, wbits=-15)
xml = unquote(xml)
如果您阅读它们的html工具的源代码,您将看到以下内容:
data = String.fromCharCode.apply(null, new Uint8Array(pako.deflateRaw(data)));
他们正在使用名为pako和“raw”模式的JS库。通过github源,您可以获得所需的设置。
https://stackoverflow.com/questions/70175214
复制相似问题