blocks|key|2629760|text|如果我正确理解了您的目标，这可以通过以下方式实现：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2629761|word+=+'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00+\x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'

my_byte_array+=+word.encode()

print(my_byte_array)|code-block|syntax|javascript|2629762|结果是：|2629763|b'BZh91AY&SYA\xc2\xaf\xc2\x82\r\x00\x00\x01\x01\xc2\x80\x02\xc3\x80\x02\x00+\x00!\xc2\x9ah3M\x07<]\xc3\x89\x14\xc3\xa1BA\x06\xc2\xbe\x084'|2629764|关于这一点有一个很好的讨论，所以如果这还不够，请使用post。他们讨论了将UTF-8字符串编码为字节数组的最佳方法(根据PEP)，以及该类涉及的其他方法。|offset|length|2629765|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/a/7585619/8476372^0|0|0|0|0|Q|4|0|0^^$0|@$1|2|3|4|5|6|7|W|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|X|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|Y|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|Z|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|10|8|@]|9|@$M|11|N|12|1|13]]|A|$]]|$1|O|3|-4|5|6|7|14|8|@]|9|@]|A|$]]]|P|$Q|$5|R|S|T|A|$U|V]]]]

If I understand your goal correctly, this can be achieved by:

<pre><code>word = 'BZh91AY&amp;SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07&lt;]\xc9\x14\xe1BA\x06\xbe\x084'

my_byte_array = word.encode()

print(my_byte_array)
</code></pre>

The result came out to be:

<pre><code>b'BZh91AY&amp;SYA\xc2\xaf\xc2\x82\r\x00\x00\x01\x01\xc2\x80\x02\xc3\x80\x02\x00 \x00!\xc2\x9ah3M\x07&lt;]\xc3\x89\x14\xc3\xa1BA\x06\xc2\xbe\x084'
</code></pre>

There is a good discussion about this on this SO <a href="https://stackoverflow.com/a/7585619/8476372">post</a> if this isn't enough. They talk about the best ways (according to PEP) to encode UTF-8 Strings to byte arrays and other methods the class involves.

blocks|key|2629841|text|你的bug很早就存在了。唯一可接受的解决方案是更改抓取代码，使其返回字节对象而不是文本对象。不要试图将字符串un“转换”成字节，这是不可靠的。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|2629842|Do+NOT+do+this：|BOLD|2629843|>>>+un+=+'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00+\x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
>>>+bz2.decompress(un.encode('raw_unicode_escape'))
b'huge'|code-block|syntax|javascript|2629844|"raw_unicode_escape“只是一种拉丁文-1编码，它对外部字符有一个内置的后备。此编码将\uXXXX和\UXXXXXXXX用于其他代码点。现有的反斜杠不会以任何方式转义。它在Python+pickle协议中使用。对于不能表示为\xXX序列的Unicode字符，您的数据将被损坏。|2629845|web抓取代码不需要将BZ2编码的字节作为str返回，所以这就是您需要解决问题的原因的地方，而不是试图处理症状。|2629846|entityMap^0|1I|2|0|3|3|0|0|35|X|0|L|3|14|2|0^^$0|@$1|2|3|4|5|6|7|T|8|@$9|U|A|V|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|W|8|@$9|X|A|Y|B|H]]|D|@]|E|$]]|$1|I|3|J|5|K|7|Z|8|@]|D|@]|E|$L|M]]|$1|N|3|O|5|6|7|10|8|@$9|11|A|12|B|H]]|D|@]|E|$]]|$1|P|3|Q|5|6|7|13|8|@$9|14|A|15|B|C]|$9|16|A|17|B|H]]|D|@]|E|$]]|$1|R|3|-4|5|6|7|18|8|@]|D|@]|E|$]]]|S|$]]

Your bug exists earlier. The only acceptable solution is to change the scraping code so that it returns a bytes object and not a text object. Do not to try and "convert" your string <code>un</code> into bytes, it can not be done reliably.

Do NOT do this:

<pre><code>&gt;&gt;&gt; un = 'BZh91AY&amp;SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07&lt;]\xc9\x14\xe1BA\x06\xbe\x084'
&gt;&gt;&gt; bz2.decompress(un.encode('raw_unicode_escape'))
b'huge'
</code></pre>

The "raw_unicode_escape" is just a Latin-1 encoding which has a built-in fallback for characters outside of it. This encoding uses \uXXXX and \UXXXXXXXX for other code points. Existing backslashes are not escaped in any way. It is used in the Python pickle protocol. For Unicode characters that cannot be represented as a \xXX sequence, your data will become corrupted. 

The web scraping code has no business returning bz2-encoded bytes as a <code>str</code>, so that's where you need to address the cause of the problem, rather than attempting to deal with the symptoms.

I have a string:

<pre><code>'BZh91AY&amp;SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07&lt;]\xc9\x14\xe1BA\x06\xbe\x084'
</code></pre>

And I want:

<pre><code>b'BZh91AY&amp;SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07&lt;]\xc9\x14\xe1BA\x06\xbe\x084'
</code></pre>

But I keep getting:

<pre><code>b'BZh91AY&amp;SYA\\xaf\\x82\\r\\x00\\x00\\x01\\x01\\x80\\x02\\xc0\\x02\\x00 \\x00!\\x9ah3M\\x07&lt;]\\xc9\\x14\\xe1BA\\x06\\xbe\\x084'
</code></pre>

Context

I scraped a string off of a webpage and stored it in the variable <code>un</code>. Now I want to decompress it using BZip2:

<pre><code>bz2.decompress(un)
</code></pre>

However, since <code>un</code> is a <code>str</code> object, I get this error:

<pre><code>TypeError: a bytes-like object is required, not 'str'
</code></pre>

Therefore, I need to convert <code>un</code> to a bytes-like object without changing the single backslash to an escaped backslash.

Edit 1:
Thank you for all the help!
@wim I understand what you mean now, but I am at a loss as to how I can retrieve a bytes-like object from my webscraping method:

<pre><code>r = requests.get('http://www.pythonchallenge.com/pc/def/integrity.html')

doc = html.fromstring(r.content)
comment = doc.xpath('//comment()')[0].text.split('\n')[1:3]

pattern = re.compile("[a-z]{2}: '(.+)'")

un = re.search(pattern, comment[0]).group(1)
</code></pre>

The packages that I am using are <code>requests</code>, <code>lxml.html</code>, <code>re</code>, and <code>bz2</code>.

Once again, my goal is to decompress <code>un</code> using <code>bz2</code>, but I am having difficulty getting a bytes-like object from my webscraping process.

Any pointers?

Python: Convert Raw String to Bytes String without adding escape chraracters

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我有一个字符串：'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'我想要：b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80...

问Python:将原始字符串转换为字节字符串，而不添加转义字符
EN

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python:将原始字符串转换为字节字符串，而不添加转义字符EN

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python:将原始字符串转换为字节字符串，而不添加转义字符
EN