首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何从文本文件创建嵌套字典

如何从文本文件创建嵌套字典
EN

Stack Overflow用户
提问于 2022-03-11 02:34:40
回答 4查看 84关注 0票数 2

所以,我的文件是这样的:

代码语言:javascript
运行
复制
Intestinal infectious diseases (001-003) 
001 Cholera
002 Typhoid and paratyphoid fevers
003 Other salmonella infections

Tuberculosis (004-006)
004 Primary tuberculous infection
005 Pulmonary tuberculosis
006 Other respiratory tuberculosis

.
.
.

我应该创建一个嵌套字典,其中疾病组作为键,字典包含疾病代码和名称,作为第一个字典的值。我很难把疾病代码分成他们自己的疾病组。以下是我迄今所做的工作:

代码语言:javascript
运行
复制
import json

icd9_encyclopedia={}
lines = []    
f = open("icd9_info.txt", 'r')
for line in f:
    line = line.rstrip("\n")
    if line[0].isnumeric() == True:
        icd9_encyclopedia[line] = ??? 
        
    


f.close()
EN

回答 4

Stack Overflow用户

发布于 2022-03-11 03:00:34

我使用defaultdict轻松地制作了一个嵌套字典,如下所示:

代码语言:javascript
运行
复制
from collections import defaultdict

icd9_encyclopedia = defaultdict(dict)
disease_group = ""
with open("icd9_info.txt", 'r') as f:
    for line in [i[:-1] for i in f.readlines()]: # [:-1] to remove '\n' for each line
        if line == "": # skip if blank line
            continue
        if not line[0].isdigit():
            disease_group = line # temporarily save current disease group name for the following lines
        else:
            code, name = line.split(maxsplit=1)
            icd9_encyclopedia[disease_group][code] = name

for key, value in icd9_encyclopedia.items():
    print(key, value)
    
#Intestinal infectious diseases (001-003) {'001': 'Cholera', '002': 'Typhoid and paratyphoid fevers', '003': 'Other salmonella infections'}
#Tuberculosis (004-006) {'004': 'Primary tuberculous infection', '005': 'Pulmonary tuberculosis', '006': 'Other respiratory tuberculosis'}

您可以在这里看到关于defaultdict的更多细节:https://www.geeksforgeeks.org/defaultdict-in-python/

票数 2
EN

Stack Overflow用户

发布于 2022-03-11 03:07:10

下面是使用基本Python的另一个问题:

代码语言:javascript
运行
复制
from pprint import pprint

icd9_encyclopedia={}

key = None
item = {}

with open("icd9_info.txt") as f:
    for line in f:
        line = line.strip()
        if not line[0].isdigit():
            # Start a new item
            if key:
                # Store the prior item in the main dictionary
                icd9_encyclopedia[key] = item
            # Initialize the new item
            key = line
            item = {}
        else:
            # A detail entry - add it to the current item
            num, rest = line.split(' ', 1)
            item[num] = rest

# Store the final item to the dictionary
if key:
    icd9_encyclopedia[key] = item

pprint(icd9_encyclopedia)

结果:

代码语言:javascript
运行
复制
{'Intestinal infectious diseases (001-003)': {'001': 'Cholera',
                                              '002': 'Typhoid and paratyphoid '
                                                     'fevers',
                                              '003': 'Other salmonella '
                                                     'infections'},
 'Tuberculosis (004-006)': {'004': 'Primary tuberculous infection',
                            '005': 'Pulmonary tuberculosis',
                            '006': 'Other respiratory tuberculosis'}}
票数 2
EN

Stack Overflow用户

发布于 2022-03-11 03:12:28

解决方案

代码语言:javascript
运行
复制
import itertools
from pathlib import Path

# load text lines
lines = Path('data.txt').read_text().split('\n')

# build output dictionary
icd9_encyclopedia = {
    # build single group dictionary
    group_name: {
        int(code): disease_name
        # split each disease line into code and text name
        for disease_string in disease_strings
        for (code, _, disease_name) in [disease_string.partition(' ')]
    }
    # get groups separated by an empty line
    # isolate first item in each group as its name
    for x, (group_name, *disease_strings) in itertools.groupby(lines, bool) if x
}

结果

代码语言:javascript
运行
复制
{'Intestinal infectious diseases (001-003)': {1: 'Cholera',
                                              2: 'Typhoid and paratyphoid '
                                                 'fevers',
                                              3: 'Other salmonella infections'},
 'Tuberculosis (004-006)': {4: 'Primary tuberculous infection',
                            5: 'Pulmonary tuberculosis',
                            6: 'Other respiratory tuberculosis'}}
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71433034

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档