blocks|key|4131188|text|平面化json数据的一种快速而简单的方法是使用flatten_json包，该包可以通过pip安装。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4131189|pip+install+flatten_json|code-block|syntax|javascript|4131190|我希望您有许多条目的列表，这些条目看起来与您提供的条目相似。因此，下面的代码将为您提供所需的结果：|4131191|import+pandas+as+pd
from+flatten_json+import+flatten

json_data+=+[{...patient1...},+{patient2...},+...]

flattened+=+(flatten(entry)+for+entry+in+json_data)
df+=+pd.DataFrame(flattened)|4131192|在扁平化的数据中，列表条目以数字为后缀(我在"labs“列表中添加了另一个患者条目)：|4131193|%2B--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------%2B
%7C+index+++demo_Profile_age++demo_Profile_bmi++demo_Profile_height+demo_Profile_sex+demo_Profile_someinfo1_0+demo_Profile_someinfo2_0+demo_Profile_someinfo3_0++demo_Profile_weight++event_info_personal_info1++event_info_personal_info2++event_info_personal_info3++event_info_personal_info4+event_labs_0_name+event_labs_0_value+event_labs_1_name+event_labs_1_value+event_symptoms_0_name+event_symptoms_0_socrates_associations_0+event_symptoms_0_socrates_onsetType+event_symptoms_0_socrates_timeCourse+event_symptoms_1_name+event_symptoms_1_socrates_timeCourse+event_symptoms_2_name+event_symptoms_2_socrates_onsetType+event_symptoms_3_name+event_symptoms_3_socrates_onsetType+event_symptoms_4_name+event_symptoms_4_socrates_associations_0+%7C
%2B--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------%2B
%7C+0++++++++++++++++98+++++++++++++++++5++++++++++++++++++160+++++++++++++male++++++++++some_more_info1+++++++++++some_more_inf2++++++++++some_more_info3++++++++++++++++++139+++++++++++++++++++++219.59+++++++++++++++++++++129.18++++++++++++++++++++++41.15++++++++++++++++++++++94.19++++++++++++name1++++++++++++valuelab+++++++++++++++NaN++++++++++++++++NaN+++++++++++++++++name1++++++++++++++++++++++++++++associations1++++++++++++++++++++++++++onsetType1++++++++++++++++++++++++++timeCourse1+++++++++++++++++name2++++++++++++++++++++++++++timeCourse2+++++++++++++++++name3++++++++++++++++++++++++++onsetType2+++++++++++++++++name4++++++++++++++++++++++++++onsetType3+++++++++++++++++name5++++++++++++++++++++++++++++associations2++++++%7C
%7C+1++++++++++++++++98+++++++++++++++++5++++++++++++++++++160+++++++++++++male++++++++++some_more_info1+++++++++++some_more_inf2++++++++++some_more_info3++++++++++++++++++139+++++++++++++++++++++219.59+++++++++++++++++++++129.18++++++++++++++++++++++41.15++++++++++++++++++++++94.19++++++++++++name1++++++++++++valuelab++++++++++++name2++++++++++valuelabr2+++++++++++++++++name1++++++++++++++++++++++++++++associations1++++++++++++++++++++++++++onsetType1++++++++++++++++++++++++++timeCourse1+++++++++++++++++name2++++++++++++++++++++++++++timeCourse2+++++++++++++++++name3++++++++++++++++++++++++++onsetType2+++++++++++++++++name4++++++++++++++++++++++++++onsetType3+++++++++++++++++name5++++++++++++++++++++++++++++associations2++++++%7C
%2B--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------%2B|4131194|flatten方法包含用于删除不需要的列或前缀的附加参数。|4131195|注意:虽然这种方法提供了所需的扁平化DataFrame，但我预计在将数据集提供给机器学习算法时，您将遇到其他问题，这取决于您的预测目标是什么，以及您希望如何将数据编码为特征。|4131196|entityMap^0|0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|V|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|W|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|X|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|Y|8|@]|9|@]|A|$]]|$1|M|3|N|5|D|7|Z|8|@]|9|@]|A|$E|F]]|$1|O|3|P|5|6|7|10|8|@]|9|@]|A|$]]|$1|Q|3|R|5|6|7|11|8|@]|9|@]|A|$]]|$1|S|3|-4|5|6|7|12|8|@]|9|@]|A|$]]]|T|$]]

A quick and easy way to flatten your json data is to use the flatten_json package which can be installed via pip 

<pre><code>pip install flatten_json
</code></pre>

I expect that you have a list of many entries which look like the one you have provided. Therefore the following code will give you the desired result:

<pre><code>import pandas as pd
from flatten_json import flatten

json_data = [{...patient1...}, {patient2...}, ...]

flattened = (flatten(entry) for entry in json_data)
df = pd.DataFrame(flattened)
</code></pre>

In the flattened data, the list entries get suffixed with numbers (I added another patient with an additional entry in the "labs" list):

<pre><code>+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| index demo_Profile_age demo_Profile_bmi demo_Profile_height demo_Profile_sex demo_Profile_someinfo1_0 demo_Profile_someinfo2_0 demo_Profile_someinfo3_0 demo_Profile_weight event_info_personal_info1 event_info_personal_info2 event_info_personal_info3 event_info_personal_info4 event_labs_0_name event_labs_0_value event_labs_1_name event_labs_1_value event_symptoms_0_name event_symptoms_0_socrates_associations_0 event_symptoms_0_socrates_onsetType event_symptoms_0_socrates_timeCourse event_symptoms_1_name event_symptoms_1_socrates_timeCourse event_symptoms_2_name event_symptoms_2_socrates_onsetType event_symptoms_3_name event_symptoms_3_socrates_onsetType event_symptoms_4_name event_symptoms_4_socrates_associations_0 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 98 5 160 male some_more_info1 some_more_inf2 some_more_info3 139 219.59 129.18 41.15 94.19 name1 valuelab NaN NaN name1 associations1 onsetType1 timeCourse1 name2 timeCourse2 name3 onsetType2 name4 onsetType3 name5 associations2 |
| 1 98 5 160 male some_more_info1 some_more_inf2 some_more_info3 139 219.59 129.18 41.15 94.19 name1 valuelab name2 valuelabr2 name1 associations1 onsetType1 timeCourse1 name2 timeCourse2 name3 onsetType2 name4 onsetType3 name5 associations2 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
</code></pre>

The flatten method contains additional parameters to remove unwanted columns or prefixes.

Note: While this method gives you a flattened DataFrame as desired, I expect that you will run into other problems when feeding the dataset into a machine learning algorithm, depending on what will be your prediction target and how you want to encode the data as features.

blocks|key|4130683|text|考虑一下熊猫的json_normalize。但是，因为还有更深的嵌套，所以可以考虑单独处理数据，然后在“规范化”列上连接起来并向前填充。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|4130684|import+json
import+pandas+as+pd
from+pandas.io.json+import+json_normalize

with+open('myfile.json',+'r')+as+f:
++++data+=+json.loads(f.read())+

final_df+=+pd.concat([json_normalize(data['demo_Profile']),+
++++++++++++++++++++++json_normalize(data['event']['symptoms']),+
++++++++++++++++++++++json_normalize(data['event']['info_personal']),+
++++++++++++++++++++++json_normalize(data['event']['labs'])],+axis=1)

#+FLATTEN+NESTED+LISTS
n_list+=+['someinfo1',+'someinfo2',+'someinfo3',+'socrates.associations']

final_df[n_list]+=+final_df[n_list].apply(lambda+col:+
+++++++++++++++++++++col.apply(lambda+x:+x++if+pd.isnull(x)+else+x[0]))

#+FILLING+FORWARD
norm_list+=+['age',+'bmi',+'height',+'weight',+'sex',+'someinfo1',+'someinfo2',+'someinfo3',+
+++++++++++++'info1',+'info2',+'info3',+'info4',+'name',+'value']

final_df[norm_list]+=+final_df[norm_list].ffill()++|code-block|syntax|javascript|4130685|输出|4130686|print(final_df)

#+++++age++bmi++height+++sex++++++++someinfo1+++++++someinfo2++++++++someinfo3++weight+++name+socrates.associations+socrates.onsetType+socrates.timeCourse+++info1+++info2++info3++info4++++name+++++value
#+0++98.0++5.0+++160.0++male++some_more_info1++some_more_inf2++some_more_info3+++139.0++name1+++++++++associations1+++++++++onsetType1+++++++++timeCourse1++219.59++129.18++41.15++94.19++name1+++valuelab
#+1++98.0++5.0+++160.0++male++some_more_info1++some_more_inf2++some_more_info3+++139.0++name2+++++++++++++++++++NaN++++++++++++++++NaN+++++++++timeCourse2++219.59++129.18++41.15++94.19++name1+++valuelab
#+2++98.0++5.0+++160.0++male++some_more_info1++some_more_inf2++some_more_info3+++139.0++name3+++++++++++++++++++NaN+++++++++onsetType2+++++++++++++++++NaN++219.59++129.18++41.15++94.19++name1+++valuelab
#+3++98.0++5.0+++160.0++male++some_more_info1++some_more_inf2++some_more_info3+++139.0++name4+++++++++++++++++++NaN+++++++++onsetType3+++++++++++++++++NaN++219.59++129.18++41.15++94.19++name1+++valuelab
#+4++98.0++5.0+++160.0++male++some_more_info1++some_more_inf2++some_more_info3+++139.0++name5+++++++++associations2++++++++++++++++NaN+++++++++++++++++NaN++219.59++129.18++41.15++94.19++name1+++valuelab|4130687|entityMap|0|LINK|mutability|MUTABLE|url|http://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.io.json.json_normalize.html^0|7|E|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@]|9|@$A|V|B|W|1|X]]|C|$]]|$1|D|3|E|5|F|7|Y|8|@]|9|@]|C|$G|H]]|$1|I|3|J|5|6|7|Z|8|@]|9|@]|C|$]]|$1|K|3|L|5|F|7|10|8|@]|9|@]|C|$G|H]]|$1|M|3|-4|5|6|7|11|8|@]|9|@]|C|$]]]|N|$O|$5|P|Q|R|C|$S|T]]]]

Consider pandas's <a href="http://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.io.json.json_normalize.html" rel="nofollow noreferrer">json_normalize</a>. However, because there are even deeper nests, consider processing data in pieces separately, then concatenate together with a fill forward on "normalized" columns.

<pre><code>import json
import pandas as pd
from pandas.io.json import json_normalize

with open('myfile.json', 'r') as f:
 data = json.loads(f.read()) 

final_df = pd.concat([json_normalize(data['demo_Profile']), 
 json_normalize(data['event']['symptoms']), 
 json_normalize(data['event']['info_personal']), 
 json_normalize(data['event']['labs'])], axis=1)

# FLATTEN NESTED LISTS
n_list = ['someinfo1', 'someinfo2', 'someinfo3', 'socrates.associations']

final_df[n_list] = final_df[n_list].apply(lambda col: 
 col.apply(lambda x: x if pd.isnull(x) else x[0]))

# FILLING FORWARD
norm_list = ['age', 'bmi', 'height', 'weight', 'sex', 'someinfo1', 'someinfo2', 'someinfo3', 
 'info1', 'info2', 'info3', 'info4', 'name', 'value']

final_df[norm_list] = final_df[norm_list].ffill() 
</code></pre>

Output

<pre><code>print(final_df)

# age bmi height sex someinfo1 someinfo2 someinfo3 weight name socrates.associations socrates.onsetType socrates.timeCourse info1 info2 info3 info4 name value
# 0 98.0 5.0 160.0 male some_more_info1 some_more_inf2 some_more_info3 139.0 name1 associations1 onsetType1 timeCourse1 219.59 129.18 41.15 94.19 name1 valuelab
# 1 98.0 5.0 160.0 male some_more_info1 some_more_inf2 some_more_info3 139.0 name2 NaN NaN timeCourse2 219.59 129.18 41.15 94.19 name1 valuelab
# 2 98.0 5.0 160.0 male some_more_info1 some_more_inf2 some_more_info3 139.0 name3 NaN onsetType2 NaN 219.59 129.18 41.15 94.19 name1 valuelab
# 3 98.0 5.0 160.0 male some_more_info1 some_more_inf2 some_more_info3 139.0 name4 NaN onsetType3 NaN 219.59 129.18 41.15 94.19 name1 valuelab
# 4 98.0 5.0 160.0 male some_more_info1 some_more_inf2 some_more_info3 139.0 name5 associations2 NaN NaN 219.59 129.18 41.15 94.19 name1 valuelab
</code></pre>

I want to implement machine learning with a dataset a bit too complex. I want to work with pandas and then use some of the built-in models in skit-learn. 

The data looks is given in JSON file, a sample looks like that below: 

<pre><code>{
 "demo_Profile": {
 "sex": "male",
 "age": 98,
 "height": 160,
 "weight": 139,
 "bmi": 5,
 "someinfo1": [
 "some_more_info1"
 ],
 "someinfo2": [
 "some_more_inf2"
 ],
 "someinfo3": [
 "some_more_info3"
 ],
 },
 "event": {
 "info_personal": {
 "info1": 219.59,
 "info2": 129.18,
 "info3": 41.15,
 "info4": 94.19,
 },
 "symptoms": [
 {
 "name": "name1",
 "socrates": {
 "associations": [
 "associations1"
 ],
 "onsetType": "onsetType1",
 "timeCourse": "timeCourse1"
 }
 },
 {
 "name": "name2",
 "socrates": {
 "timeCourse": "timeCourse2"
 }
 },
 {
 "name": "name3",
 "socrates": {
 "onsetType": "onsetType2"
 }
 },
 {
 "name": "name4",
 "socrates": {
 "onsetType": "onsetType3"
 }
 },
 {
 "name": "name5",
 "socrates": {
 "associations": [
 "associations2"
 ]
 }
 }
 ],
 "labs": [
 {
 "name": "name1 ",
 "value": "valuelab"
 }
 ]
 }
}
</code></pre>

I want to create a pandas data frame that considers this kind of "nested data" but I don't know how to build a data frame which takes into account "nested parameters" besides of "singles parameters" 

For example, I don't know how to merge "demo_Profile" which contains "single parameters" with symptoms which is a list of dictionaries of, in same cases single values, and in other cases lists. 

Anybody knows any way to deal with this issue?

EDIT*********

The JSON shown above is just one example, in other cases, the number of values in lists would be different, as well as the number of symptoms. So, the example shown above is not fixed for every case.

building a data frame with pandas out of a nested structure in python

我想用一个有点复杂的数据集来实现机器学习。我想和熊猫一起工作，然后使用短剧学习中的一些内置模型。数据外观是在JSON文件中给出的，示例如下：{  "demo_Profile": {    "sex": "male",    "age": 98,    "height": 160,    "weight": 139,  ...

问在python中使用pandas构建嵌套结构之外的数据框
EN

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在python中使用pandas构建嵌套结构之外的数据框EN

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在python中使用pandas构建嵌套结构之外的数据框
EN