我有一个字符串,我必须删除其中的时间戳和标点符号。我还必须去掉所有的数字,但responseCode的值必须保持不变,例如本例中的400。无论400来自哪里,它都不应该被移除。并且我想删除所有以tar.gz结尾的url和文件名。
mystr="sun aug 19 13:02:09 2018 I_am.98189: hello please connect to the local host:8080
sun aug 19 13:02:10 2018 hey.94289: hello not able to find the file
sun aug 19 13:02:10 2018 I_am.94289: Base url for file_transfer is: abc/vd/filename.tar.gz
mon aug 19 13:02:10 2018 how_94289: $var1={
'responseCode' = '400',
'responseDate' = 'Sun, 19 Aug 2018 13:02:08 ET',
'responseContent' = 'ABC' }
mon aug 20 13:02:10 2018 hello!94289: Error performing action, failed with error code [400]
"
预期结果:
"I_am hello please connect to the local host
hello not able to find the file
Base url for file_transfer
var1
responseCode = 400
responseDate
responseContent = ABC
Error performing action, failed with error code 400
"
我的解决方案是删除标点符号:
punctuations = '''!=()-[]{};:'"\,<>.?@#$%^&*_~'''
no_punct = ""
for char in mystr:
if char not in punctuations:
no_punct = no_punct + char
# display the unpunctuated string
print(no_punct)
发布于 2018-08-22 06:02:52
也许:
patterns = [r"\w{3} \w{3} \d{2} \d{2}:\d{2}:\d{2} \d{4}\s*", #sun aug 19 13:02:10 2018
r"\w{3}, \d{2} \w{3} \d{4} \d{2}:\d{2}:\d{2} \w{2}\s*", #Sun, 19 Aug 2018 13:02:08 ET
r":\s*([\da-zA_Z]+\/)+([a-zA-Z0-9\.]+)", #URL
r"([a-zA-Z_!]+)[\.!_]\d+:\s*", #word[._!]number:>=0space
r":\d+",
"[/':,${}\[\]]" #punctuations
]
s = mystr
for p in patterns:
s = re.sub(p,'', s)
s = s.strip()
print(s)
输出:
hello please connect to the local host
hello not able to find the file
Base url for file_transfer is
var1=
responseCode = 400
responseDate =
responseContent = ABC
Error performing action failed with error code 400
https://stackoverflow.com/questions/51956359
复制相似问题