我需要创建一个函数,根据给定的数据,它确定特定河流的监测站数目。然后,该函数需要返回(河名、站数)元组列表中的第一个N个对象,这些元组按站数的降序排序。
不过,有些河流可能有相同数目的车站,这些站需要被视为返回名单上的一个条目,但我不知道如何做到这一点。(返回的列表可能有超过N个对象,但只有N个数字.如果这有意义的话)
到目前为止,我创建的函数如下:
def rivers_by_station_number(stations, N):
riv_names = set()
for station in stations:
riv_names.add(station.river)
num_stations = []
for name in riv_names:
n = 0
for station in stations:
if station.river == name:
n += 1
else:
pass
num_stations.append(n)
lst_tuple = list(zip(riv_names, num_stations))
lst_tuple_sort = sorted(lst_tuple, key=lambda x: x[1], reverse=True)
return lst_tuple_sort[:N]
是否有一个排序函数,在这里我可以返回排序列表的第一个N个对象,同时考虑相同的数字作为一个单数项?
额外信息
当我运行该函数时,其中N= 9,得到以下结果:
[('River Thames', 55), ('River Avon', 32), ('River Great Ouse', 30), ('River Derwent', 25), ('River Aire', 24), ('River Calder', 22), ('River Severn', 21), ('River Stour', 19), ('River Ouse', 18)]
因此,对我来说幸运的是,排序列表中前9个对象中没有一个有相同数量的监测站,但是,我仍然希望在我的函数中实现上述功能,因为数据总是会发生变化。
非常感谢!
发布于 2021-02-06 22:23:35
没有内置的功能可以实现您要求的功能(据我所知),所以最好的方法似乎基本上是您正在做的事情,按站数对河流进行分组,按站数进行排序,然后从排序列表中提取第一个N
。
我还会将您的代码分解为两个不同的函数:一个接收站点列表并按河名收集它们,另一个函数获取这些(河名、站点计数)对,并提取它们的第一个N
。
利用河流收集车站的功能
真正做到这一点的唯一方法是遍历所有的站点并收集它们。
from collections import Counter
def collect_stations( stations ):
"""
:param stations: List of station objects.
:returns: Dictionary like object of name-station count pairs.
"""
river_count = {}
names = [ s.river for s in stations ]
return Counter( names )
函数返回第一个N
站点。
这里有一个更紧凑的版本
def highest_counts( river_stations, N, flatten = True ):
"""
:param river_stations: Dictionary like object of name-count pairs.
:param N: Number of count groups to return.
:param flatten: Flatten list of rivers.
:returns: If flatten is True returns a list of ( name, count ) tuples of N unique counts. i.e. Rivers with the same number of counts are treated as one element. If flatten is False, a dictionary of { count: [ ( name, count ) ] is returned, with N count keys.
"""
# group rivers by number of stations
grouped = {}
for name, count in river_stations.items():
if count not in grouped:
# add number group if it doesn't exist
grouped[ count ] = []
grouped[ count ].append( ( name, count ) )
# sort groups by number of stations
grouped = [ ( c, l ) for c, l in grouped.items() ]
grouped.sort( key = lambda x: x[ 0 ], reverse = True )
# get first N number groups
stats = grouped[ :N ]
if flatten:
stats = [
river
for num_list in stats
for river in num_list[ 1 ]
]
return stats
另一种方法是对初始列表进行排序,然后接受元素,直到看到N
站点的编号为止。
from collections import Counter
def highest_counts( river_stations, N ):
"""
:param river_stations: Dictionary like object of name-count pairs.
:param N: Number of count groups to return.
:returns: List of ( name, count ) tuples of N unique counts. i.e. Rivers with the same number of counts are treated as one element.
"""
# sorts by number of stations
river_stations_list = [ ( name, count ) for name, count in river_stations.items() ]
s = sorted( river_stations_list, key = lambda x: x[ 1 ], reverse = True )
# gets number of stations for each element
nums = [ x[ 1 ] for x in s ]
# calculates how many indices incorporate first N number groups
freqs = list( Counter( nums ).values() )
ind = sum( freqs[ :N ] )
# return first elements that incorporate N number groups
return s[ :ind ]
通过快速的性能检查,第二个版本对于更大的输入变得更快了。
最终函数
最后一个函数将将上述两种功能结合起来。
def rivers_by_station_number( stations, N ):
"""
:param stations: List of station objects.
:param N: Number of count groups to return.
:returns: List of ( name, count ) tuples of N unique counts. i.e. Rivers with the same number of counts are treated as one element.
"""
collected = collect_stations( stations ):
return highest_counts( collected, N )
发布于 2021-02-06 22:47:42
您的代码效率不高,首先我将优化它:
def rivers_by_station_number(stations, N):
river_station = {}
for station in stations:
river_station[station.river] = river_station.get(station.river, 0) + 1
sorted_river_station = sorted(river_station.items(), key=lambda x: x[1], reverse = True)
length = len(sorted_river_station)
if N>= length: return sorted_river_station
min_station_count = sorted_river_station[N-1][1]
while N<length and sorted_river_station[n] == min_station_count:
N+=1
return sorted_river_station[:N]
我所做的是,我找到了第N河的站数,从那条河一直循环到最后,同时检查剩下的河流是否有相同数量的站。
https://stackoverflow.com/questions/66081545
复制相似问题