我有一些用恼人的转义字符填充的抓取数据:
{"website": "http://www.zebrawebworks.com/zebra/bluetavern/day.cfm?&year=2018&month=7&day=10", "headliner": ["\"Roda Vibe\" with the Tallahassee Choro Society"], "data": [" \r\n ", "\r\n\t\r\n\r\n\t", "\r\n\t\r\n\t\r\n\t", "\r\n\t", "\r\n\t", "\r\n\t", "8:00 PM", "\r\n\t\r\n\tFEE: $2 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ", "\r\n\tEvery 2nd & 4th Tuesday of the month, the Choro Society returns to Blue Tavern with that subtly infectious Brazilian rhythm and beautiful melodies that will stay with you for days. The perfect antidote to Taylor Swift. $2 for musicians; tips appreciated. ", "\r\n\t", "\r\n\t\r\n\t", "\r\n\t", "\r\n\t", "\r\n\t\r\n\t\r\n\r\n\t\r\n\t", "\r\n\t\r\n\t\t", "\r\n", "\r\n", "\r\n", "\r\n"]},
我正在尝试编写一个函数来删除这些字符,但我的两种策略都不起作用:
# strategy 1
escapes = ''.join([chr(char) for char in range(1, 32)])
table = {ord(char): None for char in escapes}
for item in concert['data']:
item = item.translate(table)
# strategy 2
for item in concert['data']:
for char in item:
char = char.replace("\r", "").replace("\t", "").replace("\n", "")
为什么我的数据仍然充满了我尝试了两种不同方法删除的转义字符?
发布于 2018-06-21 09:09:12
请考虑以下几点:
lst = ["aaa", "abc", "def"]
for x in lst:
x = x.replace("a","z")
print(lst) # ['aaa', 'abc', 'def']
看起来这个列表没有变化。它是(不变的)。(Re)为for循环(x
)中使用的变量赋值可以在循环内部工作,但是更改永远不会传播回lst
。
而是:
for (i,x) in enumerate(lst):
lst[i] = x.replace("a","z")
print(lst) # ['zzz', 'zbc', 'def']
或
for i in range(len(lst)):
lst[i] = lst[i].replace("a","z")
print(lst) # ['zzz', 'zbc', 'def']
编辑
由于您使用的是赋值(x = ...
),因此必须使用类似于lst[i] = ...
的方法返回到原始列表。
对于不可变类型(包括字符串),这实际上是您唯一的选择。x.replace("a","z")
不会更改x
,它会返回一个具有指定替换项的新字符串。
对于可变类型(例如列表),您可以就地修改iterand (?)对象-- for x in lst:
中的x
。
因此,类似下面的内容将看到对x
的更改传播到了lst
。
lst = [[1],[2],[3]]
for x in lst:
x.append('added') # Example of in-place modification
print(lst) # [[1, 'added'], [2, 'added'], [3, 'added']]
因为x.append()
(与str.replace()
不同)确实更改了x
对象。
https://stackoverflow.com/questions/50958937
复制相似问题