给一非空的单词列表,返回前 k 个出现次数最多的单词。
返回的答案应该按单词出现频率由高到低排序。如果不同的单词有相同出现频率,按字母顺序排序。
示例 1:
输入: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
输出: ["i", "love"]
解析: "i" 和 "love" 为出现次数最多的两个单词,均为2次。
注意,按字母顺序 "i" 在 "love" 之前。
示例 2:
输入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
输出: ["the", "is", "sunny", "day"]
解析: "the", "is", "sunny" 和 "day" 是出现次数最多的四个单词,
出现次数依次为 4, 3, 2 和 1 次。
注意:
扩展练习:
这道题很经典,思路也有很多。我最先想到的解法是先用字典来储存单词出现的个数,再对字典排序,最后拿出前K个,如:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
class Solution(object):
def topKFrequent(self, nums, k):
"""
:type nums: List[int]
:type k: int
:rtype: List[int]
"""
data, res = {}, []
for i in nums:
data[i] = data[i] + 1 if i in data else 1
import operator
sorted_data = sorted(data.items(), key=operator.itemgetter(1), reverse=True)
for i in range(k):
res.append(sorted_data[i][0])
return res
if __name__ == '__main__':
l = ["i", "love", "leetcode", "i", "love", "coding"]
print(Solution().topKFrequent(l, 2))
提交之后发现一种更优雅的解法:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
class Solution(object):
def topKFrequent(self, nums, k):
"""
:type nums: List[int]
:type k: int
:rtype: List[int]
"""
import collections
count = collections.Counter(nums)
heap = [(-freq, word) for word, freq in count.items()]
import heapq
heapq.heapify(heap)
return [heapq.heappop(heap)[1] for _ in range(k)]
if __name__ == '__main__':
l = ["i", "love", "leetcode", "i", "love", "coding"]
print(Solution().topKFrequent(l, 2))
这里引入了两个库,下面去好好看看,分析下
看看源码:
1def __init__(*args, **kwds):
2 '''Create a new, empty Counter object. And if given, count elements
3 from an input iterable. Or, initialize the count from another mapping
4 of elements to their counts.
5
6 >>> c = Counter() # a new, empty counter
7 >>> c = Counter('gallahad') # a new counter from an iterable
8 >>> c = Counter({'a': 4, 'b': 2}) # a new counter from a mapping
9 >>> c = Counter(a=4, b=2) # a new counter from keyword args
10
11 '''
12 if not args:
13 raise TypeError("descriptor '__init__' of 'Counter' object "
14 "needs an argument")
15 self, *args = args
16 if len(args) > 1:
17 raise TypeError('expected at most 1 arguments, got %d' % len(args))
18 super(Counter, self).__init__()
19 self.update(*args, **kwds)
主要是调用了self.update
函数,再来看看self.update
1def update(*args, **kwds):
2 '''Like dict.update() but add counts instead of replacing them.
3
4 Source can be an iterable, a dictionary, or another Counter instance.
5
6 >>> c = Counter('which')
7 >>> c.update('witch') # add elements from another iterable
8 >>> d = Counter('watch')
9 >>> c.update(d) # add elements from another counter
10 >>> c['h'] # four 'h' in which, witch, and watch
11 4
12
13 '''
14
15 if not args:
16 raise TypeError("descriptor 'update' of 'Counter' object "
17 "needs an argument")
18 self, *args = args
19 if len(args) > 1:
20 raise TypeError('expected at most 1 arguments, got %d' % len(args))
21 iterable = args[0] if args else None
22 if iterable is not None:
23 if isinstance(iterable, Mapping):
24 if self:
25 self_get = self.get
26 for elem, count in iterable.items():
27 self[elem] = count + self_get(elem, 0)
28 else:
29 super(Counter, self).update(iterable) # fast path when counter is empty
30 else:
31 _count_elements(self, iterable)
32 if kwds:
33 self.update(kwds)
1def _count_elements(mapping, iterable):
2 'Tally elements from the iterable.'
3 mapping_get = mapping.get
4 for elem in iterable:
5 mapping[elem] = mapping_get(elem, 0) + 1
可以看到这里也是使用便利方法,然后利用字典保存次数。如果counter为空,就直接调用dict中的update。
heapq模块实现了Python中的堆排序,并提供了有关方法。让用Python实现排序算法有了简单快捷的方式。
源码
1def heapify(x):
2 """Transform list into a heap, in-place, in O(len(x)) time."""
3 n = len(x)
4 # Transform bottom-up. The largest index there's any point to looking at
5 # is the largest with a child index in-range, so must have 2*i + 1 < n,
6 # or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
7 # j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
8 # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
9 for i in reversed(range(n//2)):
10 _siftup(x, i)
最后看看大神关于本题的写法
1class Solution:
2 def topKFrequent(self, words, k):
3 """
4 :type words: List[str]
5 :type k: int
6 :rtype: List[str]
7 """
8 cn = [(-j,i) for i,j in collections.Counter(words).items()]
9
10 return [j[1] for j in heapq.nsmallest(k,cn)]
本文分享自 Python爬虫与算法进阶 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!