It's NP-hard

  • 12

    I think this problem is NP-hard. For proof, I'll reduce the set cover decision problem to it. The example there is:

    U = {1, 2, 3, 4, 5}
    S = {{1, 2, 3}, {2, 4}, {3, 4}, {4, 5}}

    Now I'd encode that for Minimum Unique Word Abbreviation like this:

    target: "oooo"
    dict:   "looo" (word 1)
            "lloo" (word 2)
            "lolo" (word 3)
            "olll" (word 4)
            "oool" (word 5)

    The five dictionary words correspond to the five elements of U. And every word has four letters, corresponding to the four subsets in S. You can see for example the second subset {2, 4} encoded as the second colum in the dictionary, which has l in words 2 and 4 (and otherwise only o).

    (Side note: You can build an abbreviation by picking letters and then replacing the unpicked ones with numbers. For example, when you have leetcode and pick the letters l, t and the last e, you get l2t3e. This way you can get all 2|word| possible abbreviations from the 2|word| possible ways of picking, as I've done in the first solution here.)

    Now an optimal abbreviation for the target is "o2o", which we get by picking the first "o" because that distinguishes the target from words {1, 2, 3} and by picking the last "o" because that distinguishes the target from words {4, 5}. Just like picking the subsets {1, 2, 3} and {4, 5} cover all of U in the set cover problem.

    A little problem is that if I switch the third and fourth element of S, then an optimal abbreviation picks the first and third "o", so instead of "o2o" we'd get "o1o1". That's a different length. So instead of the above encoding, I'd really do this, inserting "x" before and after each letter:

    target: "xoxoxoxox"
    dict:   "xlxoxoxox"

    Now "1o5o1" is an optimal abbreviation, and when I switch third and fourth element of S again, I instead get "1o3o3", which still has the same length.

    To be precise and usable: When I pick k letters, the optimal abbreviation will have those k letters as well as k+1 numbers, so length 2k+1. That means minimizing the abbreviation length is equivalent to minimizing the number of picked letters, which is equivalent to minimizing the number of picked elements of S in the decision problem.

    Summary: To solve a given set cover decision problem instance, I translate it to a Minimum Unique Word Abbreviation instance like above, and then from the computed optimal abbreviation, I subtract 1 from its length and divide it by 2 to get the number of picked letters, which is the minimum number of elements of S that cover U. And with that, I can directly answer the original set cover instance.

    Moral of the story: Don't feel bad about writing a brute force solution :-)

  • 3

    I realized it's NP-hard (and its equivalent decision problem is NP-complete), even though I thought about reducing the satisfiability problem to it instead. But I still was thinking whether we can do a brute force solution that, while checking possible abbreviations, doesn't have to go through the (filtered) dictionary to check whether an abbreviation qualifies.

    Essentially, we're looking for some fancy dictionary data structure that can look up wildcards instead of regular strings. I thought about using a modified trie for that, but I don't think it can work, and a regular trie would have its performance degraded with lots of wildcards.

  • 0

    Quick Question:

    In your side note, you already know 'Generalized Abbreviation' is 2^|word|, why not just do polynomial reduction from that problem (kinda same steps in your later writings)?

  • 0

    @readman How are you proposing to reduce that problem to this one? Generalized abbreviation asks you to generate all abbreviations. This one asks to find just one. You can't easily do a polynomial reduction from a problem asking to find 2^N thing to a problem that gives you just one thing as an answer. Or do I miss something?

  • 0

    @SergeyTachenov Yea, but what if you can only find 'one thing' after tried all others (he just provide prove in this part)

  • 0

    @readman I'm not sure I follow. Finding one thing by trying all others means reducing this problem to generalized abbreviation, not the other way around.

  • 0
    This post is deleted!

  • 0

    @horizon What language is that?

  • 0

  • 1

    I thought more over it,

    • We can assume, we are going to ignore items in dict whose length is not same at target length
    • It looks like we need to find a char in the target which is not present in any item of the dictionary at the same index as the index in target.

    The smallest non-matching abbrv will be of the form:
    "number1" + char + "number2" (except when abbrv == target.length), where number1, number2 are optional.
    We can start searching from the both ends and move inside as length will increase as we move towards the center of the target.

    For e.g:
    target = "apkple".
    Test whether "a" is present at 0th index in any item. If it's not then result will be "a" + target.length-1.
    "e" at 5th index. If not then 5e

    "p" at 1st index. If not then 1p4
    "l" at 4th index. If not then 4l1

    "k" at 2nd index. If not then 2k3
    "p" at 3rd index. If not then 3p2

    Do you see any gap in the argument?

  • 0

    @horizon I'm not familiar with Scala and LeetCode doesn't support it, either. Not really interested in learning Scala to be able to test your code, sorry.

Log in to reply

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.