# Simple C++ solution using hash table and bucket sort

• ``````class Solution {
public:
vector<int> topKFrequent(vector<int>& nums, int k) {
unordered_map<int, int> m;
for (int num : nums)
++m[num];

vector<vector<int>> buckets(nums.size() + 1);
for (auto p : m)
buckets[p.second].push_back(p.first);

vector<int> ans;
for (int i = buckets.size() - 1; i >= 0 && ans.size() < k; --i) {
for (int num : buckets[i]) {
ans.push_back(num);
if (ans.size() == k)
break;
}
}
return ans;
}
};``````

• good solution i think, the only drawback is that the space complexity is linear.

• Regarding "++m[num]", your assumption is m[i] is automatically initialized to 0 for all i.
Do you have a reference?

• thanks ......

• Here is a faster solution

instead of allocate excessive memory for a big bucket array, I just use a small std::map object, freq_nums_map.

Notice that in most practical scenarios frequencies follow a power law distribution, so I could assume the size of std::map freq_nums_map, m, is much smaller than n (m << n).

Time complexity: n * logm -> O(n)

``````class Solution {
public:
vector<int> topKFrequent(vector<int>& nums, int k) {
unordered_map<int, int> num_freq_map;
for (const auto &ele : nums) {
++num_freq_map[ele];
}

// notice: n * logm (m is small, usually freqs follow power distribution)
map<int, vector<int>> freq_nums_map;
for (pair<const int, int> &ele : num_freq_map) {
freq_nums_map[ele.second].push_back(ele.first);
}

vector<int> result;
for (map<int, vector<int> >::reverse_iterator it = freq_nums_map.rbegin(); it != freq_nums_map.rend(); ++it) {
for (const auto &ele : it->second) {
if (esult.size() >= k)
return result;
result.push_back(ele);
}
result.insert(result.end(), it->second.begin(), it->second.end());
}
return result;
}
};
``````

• yes if m is much less than n. But what if n == m, the complexity would n log n. That was why big array bucket sort. It trades space for time.

• My reason is that frequencies in most practical scenarios follow a power law distribution, so I could assume m << n. Unless the input data are made up or we are dealing certain rare conditions.

• if nums.size() is really big, for example, 50000000000,
both unordered_map and vector<vector<int>> buckets() may take really long time to allocate huge memory or even fail to do so and resize()/rehash() may also limit the performance.

so this solution is conditional fast. (O(n))

• I think in worst case, @Aeonaxx 's solution is O(n), and @xsh6528 's solution is O(n + (n - m^2 + m) * log(m) ), where m <= sqrt(n), because if you used m buckets, there are 1+2+...m = m*(m+1) numbers.

However, unordered_map's O(n) also has a constant that should not be totally ignored, insertion of hash table is O(1) but it still has a big constant.

So, I think two solutions just have slightly speed difference, because unordered_map take major time.

BY THE WAY, there is a very small mistake in @Aeonaxx 's solution:

``````        for (int i = buckets.size() - 1; i >= 0 && ans.size() < k; --i) {
for (int num : buckets[i]) {
ans.push_back(num);
if (ans.size() == k)
break;
}
}
``````

When it 'break;', it just break the second for(num) loop, but the first for(i) loop continue. So it's still possible that ans.size() > k.

• ``````vector<vector<int>> buckets(nums.size() + 1);

``````

I am curious why the size of the buckets is nums.size() + 1? Is it to account for the case of 0 frequency?

Appreciate any comment

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.