所以,我的文件是这样的:
Intestinal infectious diseases (001-003)
001 Cholera
002 Typhoid and paratyphoid fevers
003 Other salmonella infections
Tuberculosis (004-006)
004 Primary tuberculous infection
005 Pulmonary tuberculosis
006 Other respiratory tuberculosis
.
.
.
我应该创建一个嵌套字典,其中疾病组作为键,字典包含疾病代码和名称,作为第一个字典的值。我很难把疾病代码分成他们自己的疾病组。以下是我迄今所做的工作:
import json
icd9_encyclopedia={}
lines = []
f = open("icd9_info.txt", 'r')
for line in f:
line = line.rstrip("\n")
if line[0].isnumeric() == True:
icd9_encyclopedia[line] = ???
f.close()
发布于 2022-03-11 03:00:34
我使用defaultdict
轻松地制作了一个嵌套字典,如下所示:
from collections import defaultdict
icd9_encyclopedia = defaultdict(dict)
disease_group = ""
with open("icd9_info.txt", 'r') as f:
for line in [i[:-1] for i in f.readlines()]: # [:-1] to remove '\n' for each line
if line == "": # skip if blank line
continue
if not line[0].isdigit():
disease_group = line # temporarily save current disease group name for the following lines
else:
code, name = line.split(maxsplit=1)
icd9_encyclopedia[disease_group][code] = name
for key, value in icd9_encyclopedia.items():
print(key, value)
#Intestinal infectious diseases (001-003) {'001': 'Cholera', '002': 'Typhoid and paratyphoid fevers', '003': 'Other salmonella infections'}
#Tuberculosis (004-006) {'004': 'Primary tuberculous infection', '005': 'Pulmonary tuberculosis', '006': 'Other respiratory tuberculosis'}
您可以在这里看到关于defaultdict
的更多细节:https://www.geeksforgeeks.org/defaultdict-in-python/
发布于 2022-03-11 03:07:10
下面是使用基本Python的另一个问题:
from pprint import pprint
icd9_encyclopedia={}
key = None
item = {}
with open("icd9_info.txt") as f:
for line in f:
line = line.strip()
if not line[0].isdigit():
# Start a new item
if key:
# Store the prior item in the main dictionary
icd9_encyclopedia[key] = item
# Initialize the new item
key = line
item = {}
else:
# A detail entry - add it to the current item
num, rest = line.split(' ', 1)
item[num] = rest
# Store the final item to the dictionary
if key:
icd9_encyclopedia[key] = item
pprint(icd9_encyclopedia)
结果:
{'Intestinal infectious diseases (001-003)': {'001': 'Cholera',
'002': 'Typhoid and paratyphoid '
'fevers',
'003': 'Other salmonella '
'infections'},
'Tuberculosis (004-006)': {'004': 'Primary tuberculous infection',
'005': 'Pulmonary tuberculosis',
'006': 'Other respiratory tuberculosis'}}
发布于 2022-03-11 03:12:28
解决方案
import itertools
from pathlib import Path
# load text lines
lines = Path('data.txt').read_text().split('\n')
# build output dictionary
icd9_encyclopedia = {
# build single group dictionary
group_name: {
int(code): disease_name
# split each disease line into code and text name
for disease_string in disease_strings
for (code, _, disease_name) in [disease_string.partition(' ')]
}
# get groups separated by an empty line
# isolate first item in each group as its name
for x, (group_name, *disease_strings) in itertools.groupby(lines, bool) if x
}
结果
{'Intestinal infectious diseases (001-003)': {1: 'Cholera',
2: 'Typhoid and paratyphoid '
'fevers',
3: 'Other salmonella infections'},
'Tuberculosis (004-006)': {4: 'Primary tuberculous infection',
5: 'Pulmonary tuberculosis',
6: 'Other respiratory tuberculosis'}}
https://stackoverflow.com/questions/71433034
复制相似问题