01
—
Topk by quicksort
问题是求出数据集中,按照某个规则定义的元素大小,取前k个元素。
为了简化起见,直接求数值型数组的前k个最大元素。
'''
topk by quick sort
assert: k <= len(arr)
'''
def topkByQuickSort(k,arr=None):
topk=[]
return quickSort(0,len(arr)-1,k,topk,arr)
'''
quick sort to find top k for an unsorted array
'''
def quickSort(lo,hi,k,topk,arr):
# this is worst condition for topk
if lo>=hi:
index = len(arr) - k
while index < len(arr):
topk.append(arr[index])
index+=1
return topk
#following basic quick sort idea
i = lo
j = hi
pivot = arr[lo]
while i < j:
while j > i and pivot <= arr[j]:
j-=1
arr[i] = arr[j]
while i < j and arr[i] < pivot:
i+=1
arr[j] = arr[i]
#put pivot to i index
arr[i] = pivot
# following is important, if i less than length of array minus k, then iterator to high part;
# elif bigger, then iterator to low part
# else equal, get it!
if i < len(arr) - k :
quickSort(i+1,hi,k,topk,arr)
elif i > len(arr) - k:
quickSort(lo,i-1,k,topk,arr)
else:
index = i
while index < len(arr):
topk.append(arr[index])
index+=1
return topk
02
—
Test
topk = topkByQuickSort(1,arr=[3,5,1,6,9,8,5,12])
[print(item) for item in topk]
12
topk = topkByQuickSort(2,arr=[3,5,1,6,9,8,5,12])
[print(item) for item in topk]
9
12
topk = topkByQuickSort(8,arr=[32,5,7,6,13,9,8,4,12])
[print(item) for item in topk]
5
7
6
8
9
12
13
32
03
—
总结
快速排序思想求topk,优点是时间复杂度小,缺点是大数据时,全部一次加入内存放不下时,不太适合用这种方法,可以基于堆排序思想求topk,详细请参考:
机器学习|海量数据求top K 之最小堆实现
本文分享自 程序员郭震zhenguo 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!