我有一个corpus.json文件,需要转换为tsv format.It是一个巨大的文件,如下所示:
{'0': {'metadata': {'id': 'fQ3JoXLXxc4', 'title': '| Board Questions | 12 Maths | Equivalence Class | Equivalence Class Board Questions |', 'tags': ['Board Questions', '12 maths', '12 maths Board Questions', 'Previous Year Board Questions', 'Maths Board Questions', 'Board questions based on Equivalence Classes', 'Equivalence Class', 'Equivalence Classes in hindi'], 'description': 'Board Questions, 12 maths, 12 maths Board Questions, Previous Year Board Questions, Maths Board Questions, Board questions based on Equivalence Classes, Equivalence Class, Equivalence Classes in hindi, Equivalence Class for 12 maths, NCERT CBSE XII Maths,'}}, '1': {'subtitles': ' in this video were going to start taking a look at entropy and tropi and more specifically the kind of entropy we are going to be interested in is information entropy information entropy as opposed to another kind of entropy which you may have heard a probably heard of thermodynamic entropy information entropy comes up in the context of information theory there is actually a direct connection with thermodynamic entropy but were not going to address that here so what is entropy what is information entropy well you can think about it sort of intuitively as the uncertainty uncertainty put that in quotes since we dont really have a definition for uncertainty but you can think about it as the uncertainty in a random variable or random quantity or equivalently you can think about it as the information ....and so on
我使用以下代码:
import json
import csv
with open('Downloads/corpus.json') as json_file:
j = json.load(json_file)
with open('output.tsv', 'w') as output_file:
dw = csv.DictWriter(output_file, sorted(j.keys()), delimiter='\t')
dw.writeheader()
dw.writerows(j)
我得到以下错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-110-a9cb3b17fdd1> in <module>()
2 dw = csv.DictWriter(output_file, sorted(j.keys()), delimiter='\t')
3 dw.writeheader()
----> 4 dw.writerows(j)
~/anaconda3/lib/python3.6/csv.py in writerows(self, rowdicts)
156
157 def writerows(self, rowdicts):
--> 158 return self.writer.writerows(map(self._dict_to_list, rowdicts))
159
160 # Guard Sniffer's type checking against builds that exclude complex()
~/anaconda3/lib/python3.6/csv.py in _dict_to_list(self, rowdict)
146 def _dict_to_list(self, rowdict):
147 if self.extrasaction == "raise":
--> 148 wrong_fields = rowdict.keys() - self.fieldnames
149 if wrong_fields:
150 raise ValueError("dict contains fields not in fieldnames: "
AttributeError: 'str' object has no attribute 'keys'
在这段代码中应该修改什么。或者有没有其他方法来做这件事。
发布于 2018-04-04 19:28:22
你的代码是正确的。唯一的问题是,您正试图将json对象转换回str,正如另一个答案中提到的,这一点根本没有意义。
你想用sorted(py_str[0].keys())
实现什么?不用[0]
就试试吧。
小细节:您可以使用一条with
语句而不是两条:
import json
import csv
with open('output.tsv', 'w') as output_file, open('Downloads/corpus.json') as json_file:
json_dict = json.load(json_file)
dw = csv.DictWriter(output_file, sorted(json_dict.keys()), delimiter='\t')
dw.writeheader()
dw.writerows(json_dict)
发布于 2018-04-04 19:21:38
j
是您的JSON类对象;它是一个字典。在不知道您想要做什么的情况下,我认为您不需要py_str=json.dumps(j)
,因为这会将类似JSON的dict转换回字符串(没有键)。
一些示例交互终端命令:
>>> import json
>>> py_str = json.loads('{ "a": "b", "c": "d"}')
>>> py_str
{'a': 'b', 'c': 'd'}
>>> json.dumps(py_str)
'{"a": "b", "c": "d"}'
>>> py_str.keys()
dict_keys(['a', 'c'])
>>> json.dumps(py_str)[0]
'{' # This is the cause of the failure
发布于 2018-04-04 19:23:28
我不知道我是不是漏掉了什么,但在这个街区:
with open('Downloads/corpus.json') as json_file:
j = json.load(json_file)
您的j
是一个包含JSON数据的字典。但在这一行:
py_str=json.dumps(j)
您正在将该数据转换为字符串(基本上取消刚才所做的操作)。您所看到的错误是声明字符串没有键。
在调用j
方法时,应该使用py_str
而不是py_str
。
https://stackoverflow.com/questions/49658760
复制相似问题