我有这样一本字典,
matches = {'2-8-7 Yaesu, Chuo-Ku': ['Chuo Ward, Yaesu 2-8-7'],
'Chuo Ward, Yaesu 2-8-7': ['2-8-7 Yaesu, Chuo-Ku'],
'Fukuoka Bldg 10Th Floor': ['Fukuoka Building, 9Th -10Th Flr.'],
'Fukuoka Bldg. 8-7 Yaesu Chome': ['2-8-7 Yaesu, Chuo-Ku',
'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku'],
'Fukuoka Bldg. 9Th Fl': ['Fukuoka Building 9Th Floor'],
'Fukuoka Building 9Th Floor': ['Fukuoka Bldg. 9Th Fl',
'Fukuoka Building, 9Th -10Th Flr.']}
我希望通过查找链接(带有键或值)将它们组合在一起,键可以是任何东西(或)--这是您遇到的第一个键,这是起点。
这是我期待的输出,
{'2-8-7 Yaesu, Chuo-Ku': ['Chuo Ward, Yaesu 2-8-7',
'2-8-7 Yaesu, Chuo-Ku',
'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku',
'Fukuoka Bldg. 8-7 Yaesu Chome'],
'Fukuoka Bldg 10Th Floor': ['Fukuoka Building, 9Th -10Th Flr.',
'Fukuoka Bldg. 9Th Fl',
'Fukuoka Building 9Th Floor',
'Fukuoka Bldg. 9Th Fl']}
我试过这个,
unique_lst = set()
merged_matches = dict()
for key, values in matches.items():
if key not in unique_lst:
values_lst = []
for v in values:
output = matches.get(v)
for subkeys, subvals in matches.items():
if key != subkeys and v != subkeys:
keyvals = [subkeys] + list(subvals)
if v in keyvals:
values_lst.extend(keyvals)
if output:
values_lst.extend(output)
values_lst.append(v)
values_lst = [i for i in values_lst if i != key]
values_lst = values_lst + [key]
for v in values_lst:
unique_lst.add(v)
merged_matches[key] = values_lst
这是我的输出,
# print(merged_matches)
{'Fukuoka Bldg. 9Th Fl': ['Fukuoka Building, 9Th -10Th Flr.',
'Fukuoka Building 9Th Floor',
'Fukuoka Bldg. 9Th Fl'],
'Fukuoka Bldg. 8-7 Yaesu Chome': ['Chuo Ward, Yaesu 2-8-7',
'2-8-7 Yaesu, Chuo-Ku',
'Chuo Ward, Yaesu 2-8-7',
'2-8-7 Yaesu, Chuo-Ku',
'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku',
'Fukuoka Bldg. 8-7 Yaesu Chome'],
'Fukuoka Bldg 10Th Floor': ['Fukuoka Building 9Th Floor',
'Fukuoka Bldg. 9Th Fl',
'Fukuoka Building, 9Th -10Th Flr.',
'Fukuoka Building, 9Th -10Th Flr.',
'Fukuoka Bldg 10Th Floor']}
发布于 2022-04-27 22:37:58
国际海事组织,问题归结为找到由字典导出的图的连通分量。可以这样做的一种方法是使用UnionFind数据结构来获取由键和值构造的不相交集的列表。
然后,通过选择一个元素作为键,选择其余元素作为值,从合并的集合中构造一个字典。
from networkx.utils.union_find import UnionFind
c = UnionFind()
for k, lst in matches.items():
c.union(*[k, *lst])
out = {k: v for k, *v in map(list, c.to_sets())}
输出:
{'Chuo Ward, Yaesu 2-8-7': ['2-8-7 Yaesu, Chuo-Ku',
'Fukuoka Bldg. 8-7 Yaesu Chome',
'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku'],
'Fukuoka Building 9Th Floor': ['Fukuoka Bldg 10Th Floor',
'Fukuoka Bldg. 9Th Fl',
'Fukuoka Building, 9Th -10Th Flr.']}
发布于 2022-04-27 22:35:50
试图与networkx
,一个网络图包的创造性。这个想法是,你的问题可以翻译成“把地址分类成组,没有组之间的联系”。
因此,networkx在这里听起来是一个方便的解决方案。
了解问题所在,
Denote the following:
1 '2-8-7 Yaesu, Chuo-Ku',
2 'Chuo Ward, Yaesu 2-8-7',
3 'Fukuoka Bldg 10Th Floor',
4 'Fukuoka Bldg. 8-7 Yaesu Chome',
5 'Fukuoka Bldg. 9Th Fl',
6 'Fukuoka Building 9Th Floor',
7 'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku',
8 'Fukuoka Building, 9Th -10Th Flr.'
---------
Reading from your original list of dictionaries, the addresses have the following relationships:
1 -> 2
2 -> 1
3 -> 8
4 -> 1
4 -> 7
5 -> 6
6 -> 5
6 -> 8
The final output is two dictionaries:
(1, 2, 7, 4) and (3, 5, 8, 6).
(1,2,7,4)和(3,5,8,6)是我们所称的连通分量。因此,执行如下:
matches = {'2-8-7 Yaesu, Chuo-Ku': ['Chuo Ward, Yaesu 2-8-7'],
'Chuo Ward, Yaesu 2-8-7': ['2-8-7 Yaesu, Chuo-Ku'],
'Fukuoka Bldg 10Th Floor': ['Fukuoka Building, 9Th -10Th Flr.'],
'Fukuoka Bldg. 8-7 Yaesu Chome': ['2-8-7 Yaesu, Chuo-Ku',
'Fukuoka Building, 8-7, Yaesu 2 Chome, Chuo-Ku'],
'Fukuoka Bldg. 9Th Fl': ['Fukuoka Building 9Th Floor'],
'Fukuoka Building 9Th Floor': ['Fukuoka Bldg. 9Th Fl',
'Fukuoka Building, 9Th -10Th Flr.']}
import networkx as nx
G = nx.Graph()
for k,v in matches.items():
G.add_node(k)
for i in v:
G.add_node(i)
G.add_edge(k, i)
groups = list(nx.connected_components(G))
最后的结果groups
是一个集合列表,其中每个集合都是一个孤立的组。
https://stackoverflow.com/questions/72035835
复制相似问题