我有一些文本不清楚,有这么多的标签和ascii如下,
val =
"\nRated\xa0\n I have been to this place for dinner tonight.
\nWell I didn't found anything extraordinary there but indeed a meal worth
the price. The number of barbeque item and other both were good.\n\nFood: 3.5/5\"
为了说明我使用的这个标记
val.text.replace('\t', '').replace('\n', '').encode('ascii','ignore').
decode("utf-8").replace('Rated','').replace(' ','')
使用多次替换,我得到了我的o/p -
I have been to this place for dinner tonight. Well I didn't found anything extraordinary there but indeed a meal worth the price. The number of barbeque item and other both were good. Food: 3.5/5
我想知道有没有什么方法,这样我就可以一次只使用替换来进行类似的替换。就像这个案例-
replace('\t', '').replace('\n', '').replace(' ','')
发布于 2018-06-12 08:06:07
您可以使用.translate
删除\n\t
,然后使用您的替换空间运行:
>>> val.translate(None,'\n\t').replace(' ','')
"Rated I have been to this place for dinner tonight.Well I didn't found anything extraordinary there but indeed a meal worth the price. The number of barbeque item and other both were good.Food: 3.5/5"
replace(' ','')
在运行偶数个空格时会出现问题(它们将被删除)。您可以考虑使用正则表达式:
>>> re.sub(r'(\b *\b)',' ',val.translate(None,'\n\t'))
"Rated I have been to this place for dinner tonight.Well I didn't found anything extraordinary there but indeed a meal worth the price. The number of barbeque item and other both were good.Food: 3.5/5"
发布于 2018-06-12 08:03:48
即使我没有使用replace
,但我仍然认为这是最好的方式:
import string
val = """\nRated\xa0\n I have been to this place for dinner tonight.
\nWell I didn't found anything extraordinary there but indeed a meal worth
the price. The number of barbeque item and other both were good.\n\nFood: 3.5/5\"""
"""
print(''.join([i for i in ' '.join(val.split()) if i in string.ascii_letters+' ']))
输出:
Rated I have been to this place for dinner tonight Well I didnt found anything extraordinary there but indeed a meal worth the price The number of barbeque item and other both were good Food
https://stackoverflow.com/questions/50807518
复制相似问题