C++ solution with Detailed Explanations Using Indexed Priority Queue (i.e., HashMap + Priority Queue)


  • 5
    W

    Here is my solution to implement a Least Frequently Used Cache in C++.

    • The idea is straightforward. We maintain a min-PriorityQueue with the least frequently used element at the top. However, when we access any existing element using get() or set(), its usage frequency should be increased by one, which forces us to change its position in the priority queue (re-heapify). Unfortunately, native STL PriorityQueue does not support this operation. Therefore, we need to create our own priority queue and modify it. I found the so-called Index Priority Queue is suitable for this task (believe me, it is very simple to grasp. Use it to impress your interviewer!). It maintains not only a priority queue but also a hash map, "indexMap", that maps the key of an element to its position (index) in the priority queue. Now, we can quickly access any element in the priority queue and re-heapify the PriorityQueue, when the usage frequency of an element changes.

    • Another tricky point is that when we need to evict an element, but multiple elements have the same (minimum) usage frequency, we need to evict the least recently used (the oldest) element. To handle this, I maintain a time-stamp variable for each element in the LFU Cache, which indicates the latest time stamp when we access it. Therefore, we maintain the following invariant in the priority queue: When two elements have the same usage frequency, the least recently used one will always be closer to root node. When there are multiple least frequently used elements, we always retrieve the one closer to the root.

    class LFUCache {
    public:
        struct Node {
            int key; // key of the element.
            int val; // value of the ement.
            int fre; // usage frequency
            int timeStamp; // the latest time stamp when this element is accessed.
            Node(): key(-1), val(-1), timeStamp(-1), fre(0) {}
            Node(int k, int v, int ts): key(k), val(v), timeStamp(ts), fre(1) {}
        };
    
        LFUCache(int capacity) {
            Cap = capacity;
            Node* dummy = new Node();
            pq.push_back(dummy); // The pq start from pq[1].
            ts = 0;
        }
        
        int get(int key) {
            if(!mp.count(key)) return -1;
            int index = mp[key];
            int val = pq[index]->val;
    	pq[index]->fre++;
    	pq[index]->timeStamp = ++ts;
            sink(index);
            return val;
        }
        
        void set(int key, int value) {
            if(Cap <= 0) return;
    	if(mp.count(key)) {
    	   int index = mp[key];
    	   pq[index]->val = value;
    	   get(key);
    	}
    	else {
    	    if(pq.size() - 1 == Cap) {
    	        int oldKey = pq[1]->key;
    		mp.erase(oldKey);
    		Node* newnode = new Node(key, value, ++ts);
    		pq[1] = newnode;
    		mp[key] = 1;
    		sink(1);
    	    }
    	    else {
    	        Node* newnode = new Node(key, value, ++ts);
    		pq.push_back(newnode);
    		mp[key] = pq.size() - 1;
    		swim(pq.size() - 1);
    	    }
    	}
        }
        
    private:
    	vector<Node*> pq; // A priority queue, with the least usage frequency and least recently used element at the top.
    	unordered_map<int, int> mp; // A mapping from the key of the element to its index in the priority queue.
    	int Cap; // Capcity of the cache
    	int ts; // time-stamp: indicate the time stamp of the latest operation of an element. According to the requirement of LFU cache, when we need to evict an element from the cache, but there are multiple elements with the same minimum frequency, then the least recently used element should be evicted.
    
        /*
         * Recursively sink a node in priority queue. A node will be sinked, when its frequency is larger than any of its
         * children nodes, or the node has the same frequency with a child, but it is recently updated. 
         */
    	void sink(int index) {
    	    int left = 2 * index, right = 2 * index + 1, target = index;
    	    if(left < pq.size() && pq[left]->fre <= pq[target]->fre) // If the left child has the same frequency, we probably need to swap the parent node and the child node, because the parent node is recently accessed, and the left child node was accessed at an older time stamp.
                   target = left;
                if(right < pq.size()) { 
                    if(pq[right]->fre < pq[target]->fre || (pq[right]->fre == pq[target]->fre && pq[right]->timeStamp < pq[target]->timeStamp)) // If right child has the same frequency and an older time stamp, we must swap it.
                         target = right;
    		}
    		if(target != index) {
    		    myswap(target, index);
    	            sink(target);
    		}
    	}
        
        /*a
         * Recursively swim a node in priority queue. A node will be swimmed, when its frequency is less than its
         * parent node. If the node has the same frequency with its parent, it is not needed to be swimmed, because
         * it is recently accessed.
         */
    	void swim(int index) {
    	    int par = index / 2;
    	    while(par > 0 && pq[par]->fre > pq[index]->fre) {
    	        myswap(par, index);
    		index = par;
    		par /= 2;
    	    }
    	}
    
    	void myswap(int id1, int id2) {
    	    swap(pq[id1], pq[id2]);
    	    mp[pq[id1]->key] = id1;
    	    mp[pq[id2]->key] = id2;
    	}
    };
    

  • 0
    M

    @wangxinbo You are the fastest C++ solution so far. It only takes 170 ms!


  • 0
    J

    @mumu515 The testcases are just too weak..You can try ur own test and see the difference with other O(1) algorithms.


  • 1
    W

    @j20120307 Yes, I agree with you. My solution bears O(logN) time complexity. But I think the indexed priority queue is a good data structure for this task.


  • 0
    D

    @wangxinbo

    Why do you need dummy node at pq[0]?


  • 2
    W

    @dadreddy It is just a little rite of accessing child nodes and parent node in an array-based binary tree: for a node indexed at i, the index of its left child is 2 * i, the index of its right child is 2 * i + 1, and the index of its parent node is i / 2, no matter i is even or odd. Note that the priority queue can always start at pq[0], but then, the left child of pq[0] will be 2 * 0 + 1, and its right child will be 2 * 0 + 2. See the difference? A lot of people think the second representation need more code, I think that's the reason : P


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.