I think this problem can not be solved in O(n) time?


  • 0
    K

    Here is Java Code using PriorityQueue O(nlogk)

    class Solution {
        public List<String> topKFrequent(String[] words, int k) {
            Map<String,Integer> map=new HashMap<>();
            for (int i=0;i<words.length;i++) 
            {
                int now=map.getOrDefault(words[i],0);
                map.put(words[i],now+1);
                if (now>0) words[i]="";
            }
            Queue<String> pq=new PriorityQueue<>((a,b)->(map.get(a)!=map.get(b))?(map.get(a)-map.get(b)):(b.compareTo(a)) );
            for (String s:words)
                if (s.length()>0)
                {
                    pq.offer(s);
                    if (pq.size()>k) pq.poll();
                }
            List<String> ans=new LinkedList<>();
            while (!pq.isEmpty()) ans.add(0,pq.poll());
            return ans;
        }
    }
    

    In the extreme occasion, all words appear only once and k=n. At that time, what we should do is equivalent to sort all words in alphabetic order, which must cost us O(nlogn) time.


  • 0
    V

    Think even in the extreme case the time complexity will still be O(nlogk), since we're talking about partial order instead of a fully-sorted array. But yes, I think bucket sort cannot secure O(n) in the worst case. I guess for O(n) does it refer to time complexity in the sense of average cost? A more confusing part is about O(k) space complexity. So how may we do frequency accounting?


  • 0
    J

    I think O(n) is possible, where we can use hashmap to store the frequency first, then applied similar method called "quick select + partition" as we did on "top k elements".
    But the space cost would be O(n), I really can not come up with any solution where we use only O(k) space at the same time.


  • 0
    F

    @Victorcxq agree with you! O(n) sounds like workable since we can always gain run time at an expense of more space used. But achieving linear run time with O(k) is a bit tough, not sure if we need some computer scientist's paper work....


  • 0
    V

    @JadenPan If you refer to the quick select based on the partition function in q-sort, you must use randomized version to achieve expected time complexity O(n).


  • 0
    V

    @FF Sure, we need a computer scientist :) However I believe a version with O(k) space cost will be useful when we deal with tons of data... The bottleneck is about counting and alphabetical order... If we pursue a version with O(nlogk) time and O(k) space, is that possible through splitting dataset and merge?


  • 0
    J

    @Victorcxq Yes, the randomized version, thank you for pointing that out! For O(k) space, I really can not come up with any good ideas.


  • 0
    D

    @Victorcxq but the answer must be returned in lower_alphabetical_first order. So, we have to sort on this criteria even after running a quick-select. In the worst case, if n = k, we have nlogn again. Even if n != k, we have O(n + klogk)


  • 0

    Agree. This follow up question might just be a challenge to that interviewee.


  • 0
    L

    I actually agree, no idea how you can come up with a O(k) solution. Also all the python solutions are O(nlogn), haven't seen one with O(n) because you need to sort elements with the same frequencies lexicographically. If you use bucket sort or quickselect that would give you a solution with the same frequencies, but still wouldn't pass a case like this ["aaa","aa","a"]1 without sorting.


  • 0
    V

    @dumbness Correct. I agree with you. That's why I say on average... But admittedly "on average" here is so vague...


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.