Your design should consider removal performance


  • 10
    K

    Since this is a design question, we need to ask interviewer how this class is going to be used?
    A working code is not the answer to this question, but how you adjust your program to meet different use cases.

    Consider: There are 1000 frequent hit() followed by 1 getHits(). If we only do removal in getHits() function, it will be very time consuming. For me, I prefer to do removal in both hit() and getHits(), so that the program avoids system lag in this case.
    This is important when you design a time-critical system.


  • 1

    Good point about the lag. Another reason would be space. In two of my solutions I removed in both functions as well, allowing me to use a circular buffer of fixed capacity 300.


  • 2
    A

    Actually it won't be time consuming, if, of course, you aren't using queue of all timestamps (I think that this is total mistake).
    You should track counts and cleanup them accordingly, for example you can choose periodic garbage collection;

    public class HitCounter {
        int totalCounter = 0; LinkedHashMap<Integer, Integer> bucketMap = new LinkedHashMap<>();
        int minTimestamp = 0;
    
        /** Initialize your data structure here. */
        public HitCounter() {}
    
        void cleanup(int timestamp) {
            int cleanupCounter = 0;
    
            Iterator<Map.Entry<Integer, Integer>> entries = bucketMap.entrySet().iterator();
            while (entries.hasNext()) {
                Map.Entry<Integer, Integer> me = entries.next();
                cleanupCounter++;
                if (timestamp - me.getKey() >= 300) {
                    totalCounter -= me.getValue();
                    entries.remove();
                } else {
                    minTimestamp = me.getKey();
                    break;
                }
            }
        }
    
        /** Record a hit.
         @param timestamp - The current timestamp (in seconds granularity). */
        public void hit(int timestamp) {
            //TODO extract into configuration parameter
            if (timestamp > minTimestamp + 300) {
                cleanup(timestamp);
            }
    
            bucketMap.put(timestamp, bucketMap.getOrDefault(timestamp, 0) + 1);
            totalCounter++;
        }
    
        /** Return the number of hits in the past 5 minutes.
         @param timestamp - The current timestamp (in seconds granularity). */
        public int getHits(int timestamp) {
            cleanup(timestamp);
            return totalCounter;
        }
    }
    

  • 0

    Funny how you call it a total mistake and then right away make it yourself :-)
    If I call you with hit(1), hit(2), ..., hit(n), then you do keep piling up all those hits. And when I finally call getHits(n+300), your poor getHits does have to clean up all those n hits at once, causing the described lag.


  • 2
    A

    No, I don't pile all hits, I count them.

    And garbage collection will have lag only to clean 300 records. If there is ~1000000 timestamps per sec, then it still will need to remove only buckets, not actual timestamps.

    In my case amortized cost of cleanup is constant (300), in case of recording all timestamps and return of queue size (there are such responses) cost is O(N)

    This task is about tradeoffs and optimization, not about knowing what queue is. You always need cleanup in getHit, but having cleanup in hit is optional and I'm totally not sure that if hit is in hot path we should do cleanup there, at least that would be my strategy in answering that question on interview.

    Seems like you don't understand that main caveat of this problem (at least in interview environment with whiteboard and conversation about pros/cons of different solution) is not hit(1), hit(2) but
    hit(1) thousands times, then hit(2) thousands times.


  • 0

    You do pile up all my hits. (Along with counts, all being 1, yes, so?)

    No idea why you're talking about "knowing what queue is" now.

    And if I hit all my timestamp thousands of times, you still keep piling them up (with their counts) until that getHits call has to clean them all up all at once. All n, not just 300.


  • 0
    A

    Ok, I showed my code.
    Please show me where I pile all hits?

    After running hit(1) 300 times my code will contain
    1: 300

    Then, if there is hit(2), then map will contain

    1: 300
    2: 1
    

    After getHits code will go through all keys and delete corresponding values.
    There is no O(N) complexity, because amount of cleanup operations is unrelated to N of hits. We can say that getHit complexity is O(amount of distinct timestamps), which is constant by task description.

    Adding hit to my solution is O(1)


  • 0

    I said "those hits" and "my hits", specifically referring to my "hit(1), hit(2), ..., hit(n). You do pile up all of those.

    When I then went with your scenario of hitting each timestamp thousands of times, I said you're piling up those timestamps (with their counts). Which you do.

    Apologies for reusing "n" and not stating more clearly that that now just means number of timestamps.

    Still, one call of your getHits might have to clean up far more than 300 timestamps. Like you say, it's O(amount of distinct timestamps). But I have no idea why you think that's constant. Where does the task description say that?


  • 0
    A
    This post is deleted!

  • 0

    Um... before I even started talking here, I even successfully tested and verified that your solution ran far more than 300 loop iterations in one call. As many as I wanted it to. So no, 300 is not your worst case.


  • 0
    A

    Now I see what you meant, I was correct in evaluating complexity of O(amount of distinct timestamps), but then was mistaken by all those 300s. Of course if you have very long queue of unique timestamps without any getHit operation there might be problem that you've told about, because they weren't cleaned up.

    I have feeling that for such case periodic garbage collection (e.g. run cleanup if previous cleanup was more than 1000 timestamps ago) will be more efficient than cleaning up on each add operation, but that needs to be tested.


  • 1
    K

    Haha, I do think this is a very good/interesting interview question now.


  • 0

    One follow up is that, if the duration is much longer than 300 seconds, say 3 days instead, does our method handle space growth efficiently? Is sub-linear growth possible?

    Like OP said above, it is open-ended and depends on each use case.


  • 0
    A

    @StefanPochmann Can you post your answer for this Question.


  • 0
    Y

    @lucascho what is the answer for your assumption?


  • 0

    @andy.galkin2 I agree with a self periodic cleanup operation. It's gonna be a disaster if large amounts of cleanup were triggered by addhits()/gethits() simultaneously. Like if you hit(1),.hit(2)..hit(300) and then there comes large amounts of hit(700)s and gethits(1001)s simultaneously, the program crashes. BTW it would be great to have map be synchronized.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.