Java solution based on radix bucket sorting, beats 92.44%, nearly O(n) in #chars/string


  • 0
    S

    The idea is to separate the process into two parts:

    Radix bucket sorts

    If concat(a,b) is a palindrome, and n = Math.min(a.length, b.length),
    then the first n characters of a match the last n characters of b (in reverse order).

    I've used an approach based on bucket sorts to very efficiently match the strings that satisfy this condition (it is maximally O(n) in the number of characters, for each string.)

    We work with two collections of strings: one collection are the possible left parts (a in the concatenation example), and the other all the possible right parts. At the start of an iteration, the first offset characters of strings in the left collection all match the offset characters of strings in the right collection.

    The strings of both collections are then (separately) bucketed based on their next characters (i.e. 26 possible buckets). Every two corresponding left & right buckets now match on offset+1 characters, and are used for the next iteration.

    .

    Palindrome pre/suffix checking

    Say a is one of the strings in the left collection. When offset == length(a), we know that concat(a,b) is a suffix (for every b in the corresponding right collection) IF AND ONLY IF the character sequence in b leading up to offset is a palindrome itself.

    So, at the end of a number of iterations, the problem is reduced to checking palindromic prefixes in the corresponding collection.

    It works similar the other way around, but with palindrome prefixes instead of suffixes.

    .

    Code

    public class Solution {
        public List<List<Integer>> palindromePairs(String[] words) {
            List<List<Integer>> res = new LinkedList<List<Integer>>();
            LinkedList<Integer> currLeft = new LinkedList<Integer>();
            LinkedList<Integer> currRight = new LinkedList<Integer>();
            for (int i = 0; i < words.length; i++) {
                currLeft.add(i);
                currRight.add(i);
            }
            
            constructPalindromePairs(words, currLeft, currRight, 0, res);
            return res;
            
        }
        
        public void constructPalindromePairs(String[] words, LinkedList<Integer> left, LinkedList<Integer> right, int offset, List<List<Integer>> res) {
            
            LinkedList<Integer>[] bucketsLeft = (LinkedList<Integer>[]) new LinkedList[26];
            LinkedList<Integer>[] bucketsRight = (LinkedList<Integer>[]) new LinkedList[26];
            
            for (int i : left) {
                // If we've checked the entire length of the string, check counterparts for palindrome pre-/suffixes
                if (offset >= words[i].length()) {
                    LinkedList<Integer> matches = checkPalindromes(words, right, offset, false, i);
                    for (int match : matches) {
                        LinkedList<Integer> pair = new LinkedList<Integer>();
                        pair.add(i);
                        pair.add(match);
                        res.add(pair);
                    }
                } else {
                    // If not, sort in buckets.
                    int radix = words[i].charAt(offset) - 'a';
                    if (bucketsLeft[radix] == null) bucketsLeft[radix] = new LinkedList<Integer>();
                    bucketsLeft[radix].add(i);
                }
            }
            for (int i : right) {
                // If we've checked the entire length of the string, check counterparts for palindrome pre-/suffixes
                if (offset >= words[i].length()) {
                    LinkedList<Integer> matches = checkPalindromes(words, left, offset, true, i);
                    for (int match : matches) {
                        LinkedList<Integer> pair = new LinkedList<Integer>();
                        pair.add(match);
                        pair.add(i);
                        res.add(pair);
                    }
                } else {
                    // If not, sort in buckets.
                    int radix = words[i].charAt(words[i].length() - 1 - offset) - 'a';
                    if (bucketsRight[radix] == null) bucketsRight[radix] = new LinkedList<Integer>();
                    bucketsRight[radix].add(i);
                }
            }
            for (int i = 0; i < 26; i++) {
                if (bucketsRight[i] == null || bucketsLeft[i] == null) continue;
                constructPalindromePairs(words, bucketsLeft[i], bucketsRight[i], offset+1, res);
            }
        }
        
        public LinkedList<Integer> checkPalindromes(String[] words, LinkedList<Integer> right, int offset, boolean suffix, int filter) {
            LinkedList<Integer> res = new LinkedList<Integer>();
            for (int i : right) {
                if (i == filter) continue;  // only check DISTINCT indices.
                char[] cs = words[i].toCharArray();
                boolean isPalindrome = true;
                int from, to;
                if (suffix) {
                    if (offset == cs.length) continue;  // avoid adding duplicates.
                    from = offset;
                    to = cs.length - 1;
                } else {
                    from = 0;
                    to = cs.length - 1 - offset;
                }
                while (from < to) {
                    if (cs[from++] != cs[to--])  {
                        isPalindrome = false;
                        break;
                    }
                }
                if (isPalindrome) {
                    res.add(i);
                }
            }
            return res;
        }
    }

  • 0
    S

    Some additional notes:

    • The time complexity is linear in the number of characters of the strings during the radix bucketing step, but it's pretty hard to formulate the average complexity of the palindrome check. If anyone has any ideas...? Worst case complexity is probably when each string matches all of the others during the bucketing procedure, but all differ exactly 1 in length (e.g. "a", "aa", "aaa", "aaaa", ...)

    • I use an array of lists (ugly, I know) because I expect the bucket occupation to be sparse, and this allows for character-based indexing without adding a whole bunch of dummies to an arraylist.

    • It's pretty straightforward to get rid of the duplication (about 50%) in the main loop.

    • Checking for distinct indices sooner (e.g. when one of the collections is singleton) might speed up execution somewhat.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.