Why HashMap words count get time penalty compared to bucket word count?


  • 0

    for this problem you need to do word count, one way is as follow, use a bucket

    public class Solution {
        public int firstUniqChar(String s) {        
            int[] l_en = new int[26];
            for (char ch:s.toCharArray()){
                l_en[ch-'a'] ++;
            }
        
            for (int i=0; i<s.length(); i++){
                if(l_en[s.charAt(i)-'a']==1){
                    return i;
                }
            }
            return -1;
        }
    }
    

    the other way is as follows, using hashmap<Character, Integer>,

    public class Solution {
    public int firstUniqChar(String s) {
            HashMap<Character, Integer> l_map = new HashMap<Character, Integer>();
            char[] l_ordered = new char[s.length()];
            int uniq_cnt = 0;
        
            for (int i=0; i<s.length(); i++){
                if (!l_map.containsKey(s.charAt(i))){
                    l_map.put(s.charAt(i) , 1);
                     uniq_cnt++;
                }else{
                    l_map.put(s.charAt(i), l_map.get(s.charAt(i))+1 );
                }
            }
        
            for (int i=0; i<s.length(); i++){
                if(l_map.get(s.charAt(i))==1){
                    return i;
                }
            }
        
            return -1;
        }
    }
    

    it trivial to get first unique char in s when you have the words count. the running time for the above two implementation are significantly different, hashMap way slower than the bucket.

    the hashmap takes 162ms, which can be seen as follow (the actully running time can vary on your end when you run it, but should around 162ms))0_1477416787746_162m.png image url)

    but for the bucket solution , the running time is as follow, which only takes 20ms
    0_1477417044128_20ms.png

    so the question is why? (built-in hashMap takes more time in run time?, )

    Edit: added complete code for two solution, and screenshot of running time for each


  • 0

    How "significantly" and "way slower"? Don't you have numbers you could show?

    And you really shouldn't use different loop styles. Use an enhanced for loop for both.

    Also, it would be good to post your whole code, so that we can test it ourselves.


  • 0

    hi @StefanPochmann ,thanks for your, I just added complete code and running result from end


  • 0

    You're still using different loop styles. Why? Do you not care that the many length and charAt calls could play a significant role? To get a meaningful comparison, you should remove any unnecessary extra differences.

    After changing to an enhanced for loop, removing the unused (!) l_ordered and uniq_cnt, avoiding the double lookup by using getOrDefault, and using a good initial capacity, the whole counting part looks like this:

        HashMap<Character, Integer> l_map = new HashMap<>(64);
        for (char ch : s.toCharArray())
            l_map.put(ch, l_map.getOrDefault(ch, 0) + 1);
    

    With that, I get it accepted in around 90 ms. Much better, but still a lot slower than using an int array (not "bucket"). That shouldn't at all surprise, though:

    • A hash table is a rather complicated higher level data structure compared to a low level array, involving much more work for every operation. I mean, just look at what put for example goes through.
    • The overhead of calling functions (get and put).
    • The translation from char values to Character objects.
    • HashMap is written in Java, while the arrays might be handled more directly by lower level C/C++ code (not entirely sure about this one).

  • 0

    @StefanPochmann First Great Thanks for your input, greatly helpful. after looked at the jdk link, your point are clearer:

    1. Good point on double-lookup !!- getOrDefautdirectly get the node by calculating the key hash, which is same as containsKey in terms of checking if the key exist in any node:

      for getOrDefault

       @Override
       public V getOrDefault(Object key, V defaultValue) {
       Node<K,V> e;
       return (e = getNode(hash(key), key)) == null ? defaultValue : e.value;
      

      for containsKey

        public boolean containsKey(Object key) {
           return getNode(hash(key), key) != null;
       }
      

      if key does not exist, we both call put to set 1 for a new key, and, they work same, but for a existing key, getis causing an extra look-up.

    2. Indeed, the whole hashmap and all operation associated are expensive compared to the direct array manipulation.

    3. Not sure about the final point, but I wouldn't believe array is handled by C/C++ code, aren't they all universally compiled into bytecode and exeuted by JVM?

    4. Also, this one line takes like 10ms

      HashMap<Character, Integer> l_map = new HashMap<Character, Integer>();
      

    (just added this for fun in front of the array implementation, and run time goes up by about 10 ms)


  • 0
    1. Good idea to look up and show the source code of those functions, I didn't because I somewhat knew what they do, but it's nice to see them here. I'd just add that small strings don't cost much time anyway, and that in large strings, most of the time the key does already exist, so most of the time you do have the double look-up. At most 26 times the key doesn't exist yet, so for example in a string of length 10000, at least 9974 times the key already does exist and you'd have the containsKey/get combination.

    2. (nothing to say, but the forum software turns my "3." into a "2." so I need a placeholder :-)

    3. Yeah I don't really know the internals of that and I have trouble expressing what I mean. I guess I want to say that the HashMap is an extra layer of Java code that needs to be run, while the array is as close to the underlying layer as possible. The functionality of arrays is defined in the language/JVM, while the functionality of the HashMap is defined in the HashMap class. For example I'd expect the code for checking array index validity to be in the JVM's C++ code, while the code for checking HashMap key validity is in the class's Java code.

    4. Hmm, I just tried that three times and got accepted in 19 ms, 26 ms, 20 ms. Did you submit it only once, or did you consistently get a 10 ms increase in several attempts?


  • 0

    @StefanPochmann

    HashMap is an extra layer of Java code that needs to be run, while the array is as close to the underlying layer as possible to the point, totally agreee.

    You are right, the running time is actually steadily around 20ms, looks like it was just an accidental 10ms increase, which, just realized , should make sense as there are much done in the constructor as i didn't pass any parameters. so only the bare minimum hash table is created in HashMap<K,V>() ( and as from here Constructs an empty <tt>HashMap</tt> with the default initial capacity * (16) and the default load factor (0.75))

    above all, huge thx to you @StefanPochmann for your great input and the conversation.


  • 1

    Yeah, that constructor doesn't do much, but loading the class could still take a little time. I don't think it's significant, though. But... try adding UnaryOperator<Integer> identity = i -> i; instead. That also pretty much doesn't do anything, but increases the runtime by about 70 ms! Because it takes a lot of time to load Java's lambda support or so.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.