Are hash tables ok here? They're not really O(1), are they?


  • 8

    I predict that many of us will think of using a hash table, but I'm not aware of any that really offer the needed operations in O(1). Are there any?

    We do usually think of them as O(1), and I think that's ok because in practice they do average/amortize O(1) and everybody knows what is meant (right?) and they're usually just a small building block in the actual algorithm. Would be pretty annoying if we had to always asterisk the complexity discussion of every algorithm that uses hash tables. But here, the set data structure isn't just a helping part in the actual thing we're building. It is the actual thing. And we're explicitly required to achieve O(1), it's even the problem title. So I think we shouldn't be sloppy here.

    Update: The problem has been updated to say "average".


  • 6

    @StefanPochmann In interview setting, it's perfectly reasonable to assume hash table add/delete operations are O(1). Unless you are explicitly asked about the detailed implementation of a hash table, then you should mention that in real world, hash table could degrade to O(n) due to collision (though unlikely if you choose a good hashing algorithm).

    Hash table is seen as a basic building block and is used so often to build efficient solution to interview problems. Without assuming the mentioned operations are O(1), most problems which could be solved in hash table efficiently will have the same complexity of brute force approaches.


  • 0

    @1337c0d3r said in Are hash tables ok here? They're not really O(1), are they?:

    Hash table is seen as a basic building block and is used so often to build efficient solution to interview problems. Without assuming the mentioned operations are O(1), most problems which could be solved in hash table efficiently will have the same complexity of brute force approaches.

    Yes, I know, that's what I meant. But again, this isn't "most problems". This problem is specifically about this little data structure and we're specifically requested to provide O(1). Not average O(1), not amortized O(1). Just O(1).

    But ok, I'll just accept it and use one, too. I'll probably even use a list in addition, which doesn't really support O(1) but only amortized O(1). Is that ok as well?


  • 0

    I think we are all talking about amortized complexities here.


  • 0

    @StefanPochmann How about if the description is changed to mention amortized O(1)? Does that sound ok?


  • 0

    @StefanPochmann said in Are hash tables ok here? They're not really O(1), are they?:

    ... we're specifically requested to provide O(1). Not average O(1), not amortized O(1). Just O(1).

    Putting amortized O(1) and O(1) together is like putting together apples and oranges. I agree with Stefan, it should be specified in the problem statement.


  • 0
    K

    @StefanPochmann Well since you know that you are storing 32-bit int's you could always implement a vEB tree or y-fast trie if you really wanted all of the operations to take O(1) time Instead of using a hash table with amortized O(1) time.


  • 2

    @kevin36 Yeah but I don't like that argument. You don't even need a fancy data structure then. Just use a simple vector. Since we don't store duplicates, it'll contain at most 2^32 elements. So scanning it front to back takes O(1).


  • 4

    @1337c0d3r said in Are hash tables ok here? They're not really O(1), are they?:

    @StefanPochmann How about if the description is changed to mention amortized O(1)? Does that sound ok?

    I think "average" would be the proper term here.

    I see this problem with "amortized": Let's say we somehow managed to create the worst case, i.e., we built a hash table with n elements so that checking for a certain value takes n steps. Then I can ask for that value over and over again and thus it'll take n steps over and over again. So it's not amortized O(1). Unless the hash table reacts to that somehow, i.e., reorganizes itself just because of the checking, not because of any insertions or deletions (because I'm just checking, not inserting or deleting). I don't know whether that's usually done, but I doubt it.

    "Average" on the other hand works in a different way. The above shenanigans can of course still happen, but since they're unlikely, they don't hurt the average O(1).

    Also, both cppreference.com and cplusplus.com say "average" but not "amortized". Wikipedia's hash table article, in the top-right overview box, also says "average" and not "amortized". The whole article does say "amortized" a few times, but I think that's because it discusses a lot of different hashing ways and aspects, and some of those involve amortized costs. The Python time complexity page also says "average" (only for x in s, it doesn't list add/remove). Java's HashMap page says neither "average" nor "amortized" but does say "assuming the hash function disperses the elements properly", which sounds like average case analysis to me.


  • 0
    K

    @StefanPochmann said in Are hash tables ok here? They're not really O(1), are they?:

    @kevin36 Yeah but I don't like that argument. You don't even need a fancy data structure then. Just use a simple vector. Since we don't store duplicates, it'll contain at most 2^32 elements. So scanning it front to back takes O(1).

    Good point.


  • 2

    @StefanPochmann said in Are hash tables ok here? They're not really O(1), are they?:

    I think "average" would be the proper term here.

    Done, I've updated the description.


  • 0
    S

    @StefanPochmann Brilliant analysis. Mad props.


  • 0
    E

    @StefanPochmann said in Are hash tables ok here? They're not really O(1), are they?:
    @kevin36 Yeah but I don't like that argument. You don't even need a fancy data structure then. Just use a simple vector. Since we don't store duplicates, it'll contain at most 2^32 elements. So scanning it front to back takes O(1).

    The question doesn't seem to say no duplicates. Where do you see that? Can you describe your approach of using vector of 2pow32 elements instead of map?

    If we did have duplicates, then every hashed key in the STL unordered map can be a linked list of worst size O (n). With duplicates, can we say the amortized insert/del in the map is still O (1) or only the "average" case is O(1) ?


  • 0

    @edaengineer said in Are hash tables ok here? They're not really O(1), are they?:

    @StefanPochmann said in Are hash tables ok here? They're not really O(1), are they?:
    @kevin36 Yeah but I don't like that argument. You don't even need a fancy data structure then. Just use a simple vector. Since we don't store duplicates, it'll contain at most 2^32 elements. So scanning it front to back takes O(1).

    The question doesn't seem to say no duplicates. Where do you see that?

    The specification for insert says "Inserts an item val to the set if not already present".

    Can you describe your approach of using vector of 2pow32 elements instead of map?

    Store value i at index i as a simple boolean flag.

    If we did have duplicates, then every hashed key in the STL unordered map can be a linked list of worst size O (n). With duplicates, can we say the amortized insert/del in the map is still O (1) or only the "average" case is O(1) ?

    What do you mean "still amortized O(1)"? Above I argued that it wasn't amortized O(1).


  • 0
    E

    @StefanPochmann said in Are hash tables ok here? They're not really O(1), are they?:

    @edaengineer said in Are hash tables ok here? They're not really O(1), are they?:

    @StefanPochmann said in Are hash tables ok here? They're not really O(1), are they?:
    @kevin36 Yeah but I don't like that argument. You don't even need a fancy data structure then. Just use a simple vector. Since we don't store duplicates, it'll contain at most 2^32 elements. So scanning it front to back takes O(1).

    The question doesn't seem to say no duplicates. Where do you see that?

    The specification for insert says "Inserts an item val to the set if not already present".

    Ok now I see it.

    If we did have duplicates, then every hashed key in the STL unordered map can be a linked list of worst size O (n). With duplicates, can we say the amortized insert/del in the map is still O (1) or only the "average" case is O(1) ?

    What do you mean "still amortized O(1)"? Above I argued that it wasn't amortized O(1).

    Value I at index I can take extra space of O(max value).

    Yes, I see that you are making a distinction between "amortized" and average. For amortized your example was a map with n elements where search in which O(n) and we are somehow searching that worst case map repeatedly, so no actual "amortization" (English meaning) occurs. But isn't amortized complexity refer to a random sample (any definitive link on defining amortized complexity analysis?) and wouldn't it by definition exclude that worst case scenario occurring repeatedly? In that case amortized should be same as average.

    By "still amortized O(1)" I assume amortized to be same as average complexity.

    The worst case scenario that you have can occur if we allow duplicates to be inserted in the set and want get random to still get us all elements with equal probability.

    Any way to do this q with all O (1) complexity with duplicates allowed?


  • 0

    @edaengineer said in Are hash tables ok here? They're not really O(1), are they?:

    Value I at index I can take extra space of O(max value).

    Yes, though I was inaccurate as you can have negative values and many languages don't have negative indexes. Anyway, so it's O(number of possible values) = O(232) = O(1).

    But isn't amortized complexity refer to a random sample

    No, that's average complexity.

    Do you know and understand the typical example for amortized analysis, appending to a vector in O(1) amortized time?


  • 0
    E

    @StefanPochmann said in Are hash tables ok here? They're not really O(1), are they?:

    @edaengineer said in Are hash tables ok here? They're not really O(1), are they?:

    Value I at index I can take extra space of O(max value).

    Yes, though I was inaccurate as you can have negative values and many languages don't have negative indexes. Anyway, so it's O(number of possible values) = O(232) = O(1).

    But isn't amortized complexity refer to a random sample

    No, that's average complexity.

    Do you know and understand the typical example for amortized analysis, appending to a vector in O(1) amortized time?

    Yes, the worst case is when the vector needs resizing but once resized you can add in O (1) time.

    But why you assume hashmap will let an element lookup to continue being O (n) of its taking n steps and the same element is being lookedup for amortized complexity?
    It will resize or modify hashing so lookup is not O(n).


  • 0

    @edaengineer said in Are hash tables ok here? They're not really O(1), are they?:

    But then why you assume hashmap will let an element lookup to continue being O (n)

    Because I think it's so unlikely to happen that hashmap implementations don't necessarily bother avoiding it, as that would come at a price as well, a price perhaps not worth it.

    I just checked two things:

    C++: unordered_map::find says "Constant on average, worst case linear in the size of the container". If it were constant amortized time, why wouldn't they say so? They do say it for vector::push_back.

    Java: HashMap, if I understand that description correctly (didn't it read it thoroughly yet), uses linear lists for small buckets and trees with O(log n) for large buckets. And I think this and this is where they're searched and I don't see any reorganization efforts. Do you?

    It will resize or modify hashing

    Can you show evidence for that? Which implementations do that?


  • 0
    E

    @StefanPochmann said in Are hash tables ok here? They're not really O(1), are they?:

    @edaengineer said in Are hash tables ok here? They're not really O(1), are they?:

    But then why you assume hashmap will let an element lookup to continue being O (n)

    Because I think it's so unlikely to happen that hashmap implementations don't necessarily bother avoiding it, as that would come at a price as well, a price perhaps not worth it.

    I just checked two things:

    C++: unordered_map::find says "Constant on average, worst case linear in the size of the container". If it were constant amortized time, why wouldn't they say so? They do say it for vector::push_back.

    Java: HashMap, if I understand that description correctly (didn't it read it thoroughly yet), uses linear lists for small buckets and trees with O(log n) for large buckets. And I think this and this is where they're searched and I don't see any reorganization efforts. Do you?

    It will resize or modify hashing

    Can you show evidence for that? Which implementations do that?

    Very useful links!

    This actually supports the idea that the notion of "amortized complexity" for unordered Map/ hash map isn't meaningful (and not that amortized complexity is O (n) for maps), which is something that I was trying to get at in my first post on this topic (but couldnt fully phrase it then and your post helped).

    In other words, since no rehashing or collision avoidance is occurring for maps, it's reasonable to talk of average complexity of O (1) for a map instead of talking about amortized complexity of O (n). Reason being, the notion of amortization (which means whittling down or distributing cost over a larger input sample) is meaningful only when there is a substantial/abnormal high cost incurred in an otherwise regularly occurring single step of a function, like the case with vector insertion. It's a regular operation that after n steps will have an O (n) cost for even the most average/random input data sample. So it makes sense to talk about amortized complexity of vector insertion.

    But nothing like that happens for hash map find. For the average/random data sample the cost of O (n) wouldnt occur. So it's doesn't make sense and may not be correct to talk about amortized complexity for hash map find. Using O (1) complexity of average case for it is reasonable for all/most practical problems.

    This may not cover the duplicates scenario where n duplicates are all in a linked list so hash map find is O (n) but that is a worst case scenario (we probably shouldnt refer to it as a case of amortized complexity as the amortization notion doesnt make sense for hashmap), and I think that when talking about complexity of the approach it's reasonable to assume O (1) for hash map find.


  • 0
    8

    I'm glad to tell you, i find a way to do these operators in O(1) with the worst case.
    I change echo value into binary, and regard it as a string.
    So, i can use a Trie to storage.
    In actually, the time complexity is O(32)(if the type of value is long long, O(64)).
    But it's better than O(2^32). :)
    You can see my code here.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.