Simple C++ solution using hash table and bucket sort


  • 28
    A
    class Solution {
    public:
        vector<int> topKFrequent(vector<int>& nums, int k) {
            unordered_map<int, int> m;
            for (int num : nums)
                ++m[num];
            
            vector<vector<int>> buckets(nums.size() + 1); 
            for (auto p : m)
                buckets[p.second].push_back(p.first);
            
            vector<int> ans;
            for (int i = buckets.size() - 1; i >= 0 && ans.size() < k; --i) {
                for (int num : buckets[i]) {
                    ans.push_back(num);
                    if (ans.size() == k)
                        break;
                }
            }
            return ans;
        }
    };

  • 0

    good solution i think, the only drawback is that the space complexity is linear.


  • 0
    D

    Regarding "++m[num]", your assumption is m[i] is automatically initialized to 0 for all i.
    Do you have a reference?


  • 0

  • 0
    D

    thanks ......


  • 0
    X

    Here is a faster solution

    instead of allocate excessive memory for a big bucket array, I just use a small std::map object, freq_nums_map.

    Notice that in most practical scenarios frequencies follow a power law distribution, so I could assume the size of std::map freq_nums_map, m, is much smaller than n (m << n).

    Time complexity: n * logm -> O(n)

    class Solution {
    public:
        vector<int> topKFrequent(vector<int>& nums, int k) {
        unordered_map<int, int> num_freq_map;
        for (const auto &ele : nums) {
            ++num_freq_map[ele];
        }
    
        // notice: n * logm (m is small, usually freqs follow power distribution) 
        map<int, vector<int>> freq_nums_map;
        for (pair<const int, int> &ele : num_freq_map) {
            freq_nums_map[ele.second].push_back(ele.first);
        }
    
        vector<int> result;
        for (map<int, vector<int> >::reverse_iterator it = freq_nums_map.rbegin(); it != freq_nums_map.rend(); ++it) {
             for (const auto &ele : it->second) {
                 if (esult.size() >= k)
                     return result;
                 result.push_back(ele);
             }
            result.insert(result.end(), it->second.begin(), it->second.end());
        }
        return result;
        }
    };
    

  • 0
    A

    yes if m is much less than n. But what if n == m, the complexity would n log n. That was why big array bucket sort. It trades space for time.


  • 0
    X

    My reason is that frequencies in most practical scenarios follow a power law distribution, so I could assume m << n. Unless the input data are made up or we are dealing certain rare conditions.


  • 0
    L

    if nums.size() is really big, for example, 50000000000,
    both unordered_map and vector<vector<int>> buckets() may take really long time to allocate huge memory or even fail to do so and resize()/rehash() may also limit the performance.

    so this solution is conditional fast. (O(n))


  • 0

    I think in worst case, @Aeonaxx 's solution is O(n), and @xsh6528 's solution is O(n + (n - m^2 + m) * log(m) ), where m <= sqrt(n), because if you used m buckets, there are 1+2+...m = m*(m+1) numbers.

    However, unordered_map's O(n) also has a constant that should not be totally ignored, insertion of hash table is O(1) but it still has a big constant.

    So, I think two solutions just have slightly speed difference, because unordered_map take major time.

    BY THE WAY, there is a very small mistake in @Aeonaxx 's solution:

            for (int i = buckets.size() - 1; i >= 0 && ans.size() < k; --i) {
                for (int num : buckets[i]) {
                    ans.push_back(num);
                    if (ans.size() == k)
                        break;
                }
            }
    

    When it 'break;', it just break the second for(num) loop, but the first for(i) loop continue. So it's still possible that ans.size() > k.


  • 0
    X
    vector<vector<int>> buckets(nums.size() + 1);
    
    

    I am curious why the size of the buckets is nums.size() + 1? Is it to account for the case of 0 frequency?

    Appreciate any comment


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.