Can someone explain why it is slower to use a single hashset to track the numbers than using hash and linkedlist?


  • 1

    I wrote the solution in Java using only one hashset intuitively. It was accepted for the first time (but no longer accepted later). It costs more than 800ms. That was really awful.
    Could someone explain the difference of those 2 methods here?
    Thank you !


  • 1

    Sorry, my crystal ball is broken, now I can't even find out the language you used, let alone how you used the hashset/hash/linkedlist.


  • 0
    This post is deleted!

  • 0
    This post is deleted!

  • 0

    @StefanPochmann

    Okay. Here is my TLE or MLE code. (Accepted for the first submission).

    public class PhoneDirectory {
    HashSet<Integer> dictionary;
    /** Initialize your data structure here
        @param maxNumbers - The maximum numbers that can be stored in the phone directory. */
    public PhoneDirectory(int maxNumbers) {
        dictionary = new HashSet<Integer>();
        for(int i = 0;i<maxNumbers;i++){
            dictionary.add(i);
        }
    }
    
    /** Provide a number which is not assigned to anyone.
        @return - Return an available number. Return -1 if none is available. */
    public int get() {
        Iterator iter = dictionary.iterator();
        if(iter.hasNext()){
            int temp = (int)iter.next();
            dictionary.remove(temp);
            return temp;
        }
        else{
            return -1;
        }
    }
    
    /** Check if a number is available or not. */
    public boolean check(int number) {
        if(dictionary.contains(number)==true) return true;
        else return false;
    }
    
    /** Recycle or release a number. */
    public void release(int number) {
        dictionary.add(number);
    }
    

    }


  • 1

    I'm not 100% certain, but I suspect iterating the HashSet is the problem, even if you're just iterating to the first element. If the data is additionally also stored in a linked list, iteration is probably much simpler and faster. There's btw a class LinkedHashSet. Using that instead of HashSet gets your code accepted in about 600 ms (except for an occasional Memory Limit Exceeded that I don't know about).

    Btw,

        if(dictionary.contains(number)==true) return true;
        else return false;
    

    is pretty silly.


  • 5

    Ok I did an experiment with the iteration. I added this at the top of get():

            for (int i=0; i<100; i++) {
                Iterator iter = dictionary.iterator();
                if (iter.hasNext())
                    iter.next();
            }
    

    And then I used "Run Code" with the test case where your solution exceeded the time limit. The runtimes in four attempts were: 487 516 602 561. Average 541.5 ms.

    Then I tried again, but using TreeSet. The times dropped to: 152 202 167 167. Average 172.0 ms.

    Then I tried the aforementioned LinkedHashSet. The times dropped further: 162 133 141 163. Average 149.75 ms.

    Then, to get a baseline, I removed that extra code again and times were: 132 119 137 127. Average 128.75 ms.

    So yeah, looks like iteration, even to just the first element, is very costly for a HashSet.


  • 0

    @StefanPochmann Thank you for testing the code and giving those suggestions, Stefan.
    And for the silly stuff, yea, it is really silly. I am still a noob though. My coding style was even worse and I am trying to learn how to write concise and pretty code. ;)


  • 0

    @Nakanu
    I got the same issue. The problem is that the iterator of HashSet is more expensive than TreeSet or LinkedHashSet due to the baseline construction of the set.
    I think that TreeSet and LinkedHashSet maintain a "linked" order between elements during the each operation. However, the HashSet need to create the "link" when iterator is called.
    Check the official doc of LinkedHashSet:

    This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). Note that insertion order is not affected if an element is re-inserted into the set. (An element e is reinserted into a set s if s.add(e) is invoked when s.contains(e) would return true immediately prior to the invocation.)


  • 0

    @yubad2000 A bit further down, that page also says:

    Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.

    Btw, LeetCode uses Java 8. Most of the time it doesn't matter, but sometimes it does.


  • 0

    @StefanPochmann
    In this problem, the case is that we only need to get the FIRST element. Therefore, HashSet has the disadvantage of going through proportional element of it capacity to construct the link, while LinkedHashSet uses only the first proportion of the list which might be already constructed during the operation.


  • 0

    @yubad2000
    Thank you for sharing your opinions.
    I have checked the doc of HashSet, LinkedHashSet, and TreeSet myself too.
    As what you said, both TreeSet and LinkedHashSet have elements "linked" because LinkedHashSet keeps the order of adding objects and TreeSet keeps the objects in ascending order by default.
    But HashSet only uses hash function to compute where to store the object and its order is not fixed.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.