我有一个包含混合纯文本的json对象的报告,如下所示:
Payload
0x0000: some text
text
{
"text1": {
"text2": {
"text3": "value3",
"text4": "value4",
"text5": "value5"
},
"text6": "value6",
"text7": "value7"
},
"text8": "value8"
}
Payload 2
0x0001: some other text
other text
{
"text1": {
"text2": {
"text3": "value3",
"text4": "value4",
"text5": "value5"
},
"text6": "value6",
"text7": "value7"
},
"text8": "value8"
}我想要做的是读取文件,提取这些json对象,并从每个对象中获取特定的值。问题是报告很大,json对象之间的纯文本并不是每次都包含相同的单词。我尝试的是使用json.loads(.)(失败)和json.dumps(.)它不会忽略这些纯文本字符串。
filedes = open(path, "r")
# Reading the whole file
text = filedes.read()
text = json.dumps(text)对于如何在不手动删除这些纯文本行的情况下解析这些文本行,有什么想法吗?
发布于 2022-08-23 20:21:20
您需要解析整个文件并跟踪对象的边界。
import json
values = []
with open('data.txt') as f:
in_object = False
data = None
for line in f.readlines():
line = line.strip()
if line == "}":
if data is not None:
data.append(line)
content = '\n'.join(data)
values.append(json.loads(content))
in_object = False
elif line == "{":
in_object = True
data = [line]
else:
if in_object and data is not None:
data.append(line)
print(f'Found {len(values)} values')
for v in values:
print(v)https://stackoverflow.com/questions/73464359
复制相似问题