Simple and Clean Python code, O(logK) for getting news feed


  • 9
    import heapq
    
    class Twitter(object):
    
        def __init__(self):
            self.time = 0
            self.tweets = {}
            self.followee = {}
            
    
        def postTweet(self, user, tweet):
            self.time += 1
            self.tweets[user] = self.tweets.get(user, []) + [(-self.time,  tweet)]
            
            
    
        def getNewsFeed(self, user):
            h, tweets = [], self.tweets
            people = self.followee.get(user, set()) | set([user])
            for person in people:
                if person in tweets and tweets[person]:
                    time, tweet = tweets[person][-1]
                    h.append((time, tweet, person, len(tweets[person]) - 1))
            heapq.heapify(h)
            news = []
            for _ in range(10):
                if h:
                    time, tweet, person, idx = heapq.heappop(h)
                    news.append(tweet)
                    if idx:
                        new_time, new_tweet = tweets[person][idx-1]
                        heapq.heappush(h, (new_time, new_tweet, person, idx - 1))
            return news
            
            
    
        def follow(self, follower, other):
            self.followee[follower] = self.followee.get(follower, set()) | set([other])
            
            
    
        def unfollow(self, follower, other):
            if follower in self.followee:
                self.followee[follower].discard(other)
    

    K is the number of followee of user. We have O(log(K)) runtime for getting news feed because we do maximum 10 extractions in a heap that holds maximum K elements (similar to what is done in merge K linked lists). The other ops are obviously O(1).


  • 0

    I do not think this will work well in a real system. You need to consider that data are actually stored on a database. The algorithm used in merge K lists will access DB many times. We can just pull 10 tweets from each followed user and find the top K tweets using any algorithm. The time spent on memory is much less than the time spent on DB reading which is reading the physical disk.


  • 0

    @jedihy I don't get what you're saying. You can cache the database in main memory... Besides, what you just describe is not less than O(log(K)).


  • 0

    Yes but cache is expensive. The way I just describe is greater than O(log(K)) obviously but it does not matter. It will perform exactly 10 PULL operations to get tweets. Yours will do at most 20 PULL operations. The overhead on selecting the top K elements could be ignored compared to DB reading.

    Anyway, as you said, we can cache these in memory. I probably overthink it.


  • 2
    L

    How is this logN? When every add and pop operation to the heap queue is logN?


  • 0

    @lekzeey have you read the comments after the code in my post?


  • 2
    L

    @agave I did, I believe that is incorrect. Because every add operation into the heap is log(N) and every pop operation is log(N) and depending on how many elements are in the queue, that is Nlog(N). Merge K linked lists has an optimum algorithm of Nlog(N) as well.


  • 0
    W

    @lekzeey

    Agree.


  • 3
    H

    I don't think heapify is O(logn) operation.
    Yes, insert and remove operation for heap only need O(logn). But what you did is to build a max-heap from an unsorted (but distinct) sequential container, the time complexity is O(n).

    Anyway, thanks for your post. I posted an algorithm in C++ days ago, and I used the same method as you. The time complexity is O(n), and I'm shocked when seeing you can achieve O(logn) for getting news feed.

    Please correct me if I'm wrong.


  • 0

    I agree the push pop operations are O(logK) individually, where K is the number of followees, because we only need to get 10 most recent posts which means the total time costs are (c10logK), where c is a constant number. However, heapify a heap which has K items inside, will takes O(K) time. Therefore, the time complexity becomes to compare which number will be larger: 10c*logK or K.


  • 0

    Here for "heapify a heap" I mean construct a new heap from scratch, which normally takes O(n) time.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.