文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用Python/Django进行HTML解码/编码？

问如何使用Python/Django进行HTML解码/编码？
EN

Stack Overflow用户

提问于 2008-11-08 20:44:31

回答 15查看 203.7K关注 0票数 146

我有一个HTML编码的字符串：

'''&lt;img class=&quot;size-medium wp-image-113&quot;\
 style=&quot;margin-left: 15px;&quot; title=&quot;su1&quot;\
 src=&quot;http://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg&quot;\
 alt=&quot;&quot; width=&quot;300&quot; height=&quot;194&quot; /&gt;'''

我想将其更改为：

<img class="size-medium wp-image-113" style="margin-left: 15px;" 
  title="su1" src="http://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg" 
  alt="" width="300" height="194" />

我希望将其注册为HTML，以便浏览器将其呈现为图像，而不是显示为文本。

字符串是这样存储的，因为我使用的是一个名为BeautifulSoup的web抓取工具，它“扫描”网页并从中获取特定内容，然后以该格式返回字符串。

我已经在C#中找到了如何做到这一点，但在Python中却没有。有人能帮帮我吗？

回答 15

Stack Overflow用户

回答已采纳

发布于 2008-11-08 21:40:38

对于Django用例，有两个答案。下面是它的django.utils.html.escape函数，供参考：

def escape(html):
    """Returns the given HTML with ampersands, quotes and carets encoded."""
    return mark_safe(force_unicode(html).replace('&', '&amp;').replace('<', '&l
t;').replace('>', '&gt;').replace('"', '&quot;').replace("'", '&#39;'))

要扭转这种情况，杰克的答案中描述的Cheetah函数应该可以工作，但缺少单引号。此版本包含更新后的元组，替换顺序颠倒以避免对称问题：

def html_decode(s):
    """
    Returns the ASCII decoded version of the given HTML string. This does
    NOT remove normal HTML tags like <p>.
    """
    htmlCodes = (
            ("'", '&#39;'),
            ('"', '&quot;'),
            ('>', '&gt;'),
            ('<', '&lt;'),
            ('&', '&amp;')
        )
    for code in htmlCodes:
        s = s.replace(code[1], code[0])
    return s

unescaped = html_decode(my_string)

但是，这不是一个通用的解决方案；它只适用于用django.utils.html.escape编码的字符串。更广泛地说，坚持使用标准库是一个好主意：

# Python 2.x:
import HTMLParser
html_parser = HTMLParser.HTMLParser()
unescaped = html_parser.unescape(my_string)

# Python 3.x:
import html.parser
html_parser = html.parser.HTMLParser()
unescaped = html_parser.unescape(my_string)

# >= Python 3.5:
from html import unescape
unescaped = unescape(my_string)

作为建议:将未转义的HTML存储在数据库中可能更有意义。如果可能的话，从BeautifulSoup获取未转义的结果，并完全避免这个过程，这是值得研究的。

对于Django，转义只在模板呈现期间发生；因此，为了防止转义，您只需告诉模板引擎不要转义您的字符串。为此，请在模板中使用以下选项之一：

{{ context_var|safe }}
{% autoescape off %}
    {{ context_var }}
{% endautoescape %}

票数 136

Stack Overflow用户

发布于 2011-08-17 13:51:23

使用标准库：

HTML转义

尝试:从html导入转义# print(escape("<"))

HTML 3.x除了ImportError:从cgi导入转义# python 2.x python2.x取消转义

尝试:从html导入print(unescape(">"))除外# python 3.4+除ImportError: try:从html.parser导入HTMLParser # python 3.x (<3.4)除外ImportError:从HTMLParser导入HTMLParser # python 2.x unescape = HTMLParser().unescape python

票数 130

Stack Overflow用户

发布于 2009-01-16 01:12:54

对于html编码，有来自标准库的cgi.escape：

>> help(cgi.escape)
cgi.escape = escape(s, quote=None)
    Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
    is also translated.

对于html解码，我使用以下代码：

import re
from htmlentitydefs import name2codepoint
# for some reason, python 2.5.2 doesn't have this one (apostrophe)
name2codepoint['#39'] = 39

def unescape(s):
    "unescape HTML code refs; c.f. http://wiki.python.org/moin/EscapingHtml"
    return re.sub('&(%s);' % '|'.join(name2codepoint),
              lambda m: unichr(name2codepoint[m.group(1)]), s)

对于更复杂的情况，我使用BeautifulSoup。

票数 80

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/275174

复制

相似问题

问如何使用Python/Django进行HTML解码/编码？
EN

回答 15

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用Python/Django进行HTML解码/编码？EN

回答 15

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用Python/Django进行HTML解码/编码？
EN