blocks|key|547912|text|\xa0实际上是Latin1+(ISO8859-1)中的不间断空格，也是chr(160)。您应该将其替换为空格。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|547913|string+=+string.replace(u'\xa0',+u'+')|offset|length|style|CODE|547914|当.encode('utf-8')时，它会将unicode编码成utf-8，这意味着每个unicode可以用1到4个字节表示。在这种情况下，\xa0由2个字节\xc2\xa0表示。|547915|阅读http://docs.python.org/howto/unicode.html上的内容。|547916|请注意:这个答案从2012年开始，Python已经更新换代了，你现在应该可以使用unicodedata.normalize了|547917|entityMap|0|LINK|mutability|MUTABLE|url|http://docs.python.org/howto/unicode.html^0|0|0|12|0|0|2|15|0|0|14|L|0^^$0|@$1|2|3|4|5|6|7|V|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|W|8|@$D|X|E|Y|F|G]]|9|@]|A|$]]|$1|H|3|I|5|6|7|Z|8|@]|9|@]|A|$]]|$1|J|3|K|5|6|7|10|8|@]|9|@$D|11|E|12|1|13]]|A|$]]|$1|L|3|M|5|6|7|14|8|@$D|15|E|16|F|G]]|9|@]|A|$]]|$1|N|3|-4|5|6|7|17|8|@]|9|@]|A|$]]]|O|$P|$5|Q|R|S|A|$T|U]]]]

\xa0 is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160). You should replace it with a space.

<code>string = string.replace(u'\xa0', u' ')</code>

When .encode('utf-8'), it will encode the unicode to utf-8, that means every unicode could be represented by 1 to 4 bytes. For this case, \xa0 is represented by 2 bytes \xc2\xa0. 

Read up on <a href="http://docs.python.org/howto/unicode.html" rel="noreferrer">http://docs.python.org/howto/unicode.html</a>. 

Please note: this answer in from 2012, Python has moved on, you should be able to use <code>unicodedata.normalize</code> now

blocks|key|331518|text|Python的unicodedata库中有很多有用的东西。其中之一是.normalize()函数。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|331519|尝试：|331520|new_str+=+unicodedata.normalize("NFKD",+unicode_str)|code-block|syntax|javascript|331521|如果您没有得到您想要的结果，请使用上面链接中列出的任何其他方法替换NFKD。|331522|entityMap|0|LINK|mutability|MUTABLE|url|https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize^0|7|B|Y|C|Y|C|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|W|8|@$9|X|A|Y|B|C]|$9|Z|A|10|B|C]]|D|@$9|11|A|12|1|13]]|E|$]]|$1|F|3|G|5|6|7|14|8|@]|D|@]|E|$]]|$1|H|3|I|5|J|7|15|8|@]|D|@]|E|$K|L]]|$1|M|3|N|5|6|7|16|8|@]|D|@]|E|$]]|$1|O|3|-4|5|6|7|17|8|@]|D|@]|E|$]]]|P|$Q|$5|R|S|T|E|$U|V]]]]

There's many useful things in Python's <code>unicodedata</code> library. One of them is the <a href="https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize" rel="noreferrer"><code>.normalize()</code></a> function.

Try:

<pre><code>new_str = unicodedata.normalize("NFKD", unicode_str)
</code></pre>

Replacing NFKD with any of the other methods listed in the link above if you don't get the results you're after.

blocks|key|548151|text|试着在行尾使用.strip()+line.strip()对我来说效果很好|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|548152|entityMap^0|G|C|0^^$0|@$1|2|3|4|5|6|7|H|8|@$9|I|A|J|B|C]]|D|@]|E|$]]|$1|F|3|-4|5|6|7|K|8|@]|D|@]|E|$]]]|G|$]]

Try using .strip() at the end of your line
<code>line.strip()</code> worked well for me

blocks|key|547736|text|试试这个：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|547737|string.replace('\\xa0',+'+')|code-block|syntax|javascript|547738|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

try this:

<pre><code>string.replace('\\xa0', ' ')
</code></pre>

blocks|key|331169|text|我在使用python从sqlite3数据库中提取数据时遇到了同样的问题。上面的答案对我不起作用(不知道为什么)，但这个答案起作用了：line+=+line.decode('ascii',+'ignore')然而，我的目标是删除\xa0，而不是用空格替换它们。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|331170|这是我从this+super-helpful+unicode+tutorial+by+Ned+Batchelder.那里得到的|331171|entityMap|0|LINK|mutability|MUTABLE|url|http://nedbatchelder.com/text/unipain.html^0|1U|11|0|4|1I|0|0^^$0|@$1|2|3|4|5|6|7|P|8|@$9|Q|A|R|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|S|8|@]|D|@$9|T|A|U|1|V]]|E|$]]|$1|H|3|-4|5|6|7|W|8|@]|D|@]|E|$]]]|I|$J|$5|K|L|M|E|$N|O]]]]

I ran into this same problem pulling some data from a sqlite3 database with python. The above answers didn't work for me (not sure why), but this did: <code>line = line.decode('ascii', 'ignore')</code> However, my goal was deleting the \xa0s, rather than replacing them with spaces.
I got this from <a href="http://nedbatchelder.com/text/unipain.html" rel="noreferrer">this super-helpful unicode tutorial by Ned Batchelder.</a>

blocks|key|548380|text|试试这段代码|type|unstyled|depth|inlineStyleRanges|entityRanges|data|548381|import+re
re.sub(r'[%5E\x00-\x7F]%2B','','paste+your+string+here').decode('utf-8','ignore').strip()|code-block|syntax|javascript|548382|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Try this code

<pre><code>import re
re.sub(r'[^\x00-\x7F]+','','paste your string here').decode('utf-8','ignore').strip()
</code></pre>

blocks|key|331984|text|Python将其识别为一个空格字符，因此您可以在不使用args的情况下对其执行split操作，并使用一个普通的空格进行连接：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|331985|line+=+'+'.join(line.split())|code-block|syntax|javascript|331986|entityMap^0|13|5|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|P|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Q|8|@]|D|@]|E|$]]]|L|$]]

Python recognize it like a space character, so you can <code>split</code> it without args and join by a normal whitespace:

<pre><code>line = ' '.join(line.split())
</code></pre>

blocks|key|331229|text|当我用谷歌搜索不可打印字符的问题时，我在这里结束了。我使用MySQL+UTF-8+general_ci处理波兰语。对于有问题的字符串，我必须按如下方式处理：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|331230|text=text.replace('\xc2\xa0',+'+')|code-block|syntax|javascript|331231|这只是一个快速的变通方法，你可能应该尝试一些正确的编码设置。|331232|entityMap^0|Z|5|15|A|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@$9|P|A|Q|B|C]|$9|R|A|S|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|T|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|U|8|@]|D|@]|E|$]]|$1|M|3|-4|5|6|7|V|8|@]|D|@]|E|$]]]|N|$]]

I end up here while googling for the problem with not printable character. I use MySQL <code>UTF-8</code> <code>general_ci</code> and deal with polish language. For problematic strings I have to procced as follows:

<pre><code>text=text.replace('\xc2\xa0', ' ')
</code></pre>

It is just fast workaround and you probablly should try something with right encoding setup.

blocks|key|330999|text|0xA0+(Unicode)是UTF-8格式的0xC2A0。.encode('utf8')会把你的unicode0xA0替换成UTF+8的0xC2A0。因此0xC2s的幽灵...正如您现在可能已经意识到的那样，编码并没有被取代。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|331000|entityMap^0|U|F|0^^$0|@$1|2|3|4|5|6|7|H|8|@$9|I|A|J|B|C]]|D|@]|E|$]]|$1|F|3|-4|5|6|7|K|8|@]|D|@]|E|$]]]|G|$]]

0xA0 (Unicode) is 0xC2A0 in UTF-8. <code>.encode('utf8')</code> will just take your Unicode 0xA0 and replace with UTF-8's 0xC2A0. Hence the apparition of 0xC2s... Encoding is not replacing, as you've probably realized now.

blocks|key|331295|text|在Beautiful+Soup中，您可以传递get_text()参数，该参数从文本的开头和结尾去掉空格。这将删除出现在字符串开头或结尾处的\xa0或任何其他空格。Beautiful用\xa0替换了一个空字符串，这为我解决了这个问题。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|331296|mytext+=+soup.get_text(strip=True)|code-block|syntax|javascript|331297|entityMap^0|M|A|1X|4|2J|4|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]|$9|R|A|S|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|T|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|U|8|@]|D|@]|E|$]]]|L|$]]

In Beautiful Soup, you can pass <code>get_text()</code> the strip parameter, which strips white space from the beginning and end of the text. This will remove <code>\xa0</code> or any other white space if it occurs at the start or end of the string. Beautiful Soup replaced an empty string with <code>\xa0</code> and this solved the problem for me.

<pre><code>mytext = soup.get_text(strip=True)
</code></pre>

blocks|key|331933|text|它相当于一个空格字符，所以去掉它|type|unstyled|depth|inlineStyleRanges|entityRanges|data|331934|print(string.strip())+#+no+more+xa0|code-block|syntax|javascript|331935|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

It's the equivalent of a space character, so strip it

<pre><code>print(string.strip()) # no more xa0
</code></pre>

blocks|key|331601|text|带有正则表达式的通用版本(它将删除所有控制字符)：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|331602|import+re
def+remove_control_chart(s):
++++return+re.sub(r'\\x..',+'',+s)|code-block|syntax|javascript|331603|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Generic version with the regular expression (It will remove all the control characters):

<pre><code>import re
def remove_control_chart(s):
 return re.sub(r'\\x..', '', s)
</code></pre>

blocks|key|548595|text|你可以试试string.strip()|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|548596|它对我很有效！:)|548597|entityMap^0|5|E|0|0^^$0|@$1|2|3|4|5|6|7|J|8|@$9|K|A|L|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|M|8|@]|D|@]|E|$]]|$1|H|3|-4|5|6|7|N|8|@]|D|@]|E|$]]]|I|$]]

You can try <code>string.strip()</code> 
It worked for me! :)

I am currently using Beautiful Soup to parse an HTML file and calling <code>get_text()</code>, but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove all of them in Python 2.7, and change them into spaces? I guess the more generalized question would be, is there a way to remove Unicode formatting?

I tried using: <code>line = line.replace(u'\xa0',' ')</code>, as suggested by another thread, but that changed the \xa0's to u's, so now I have "u"s everywhere instead. ):

EDIT: The problem seems to be resolved by <code>str.replace(u'\xa0', ' ').encode('utf-8')</code>, but just doing <code>.encode('utf-8')</code> without <code>replace()</code> seems to cause it to spit out even weirder characters, \xc2 for instance. Can anyone explain this?

How to remove \xa0 from string in Python?

我目前正在使用Beautiful Soup来解析一个超文本标记语言文件并调用get_text()，但是似乎我被留下了很多表示空格的\xA0Unicode。在Python2.7中，有没有一种有效的方法将它们全部移除，并将它们改为空格？我想更普遍的问题应该是，有没有一种方法可以删除Unicode格式？我尝试使用：line = line.replace(u'\xa0',' ')，正如另一个线程所建议的那

问如何在Python中从字符串中删除\xa0？
EN

回答 13

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Python中从字符串中删除\xa0？EN

回答 13

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Python中从字符串中删除\xa0？
EN