Detailed explanation on building up lps[] for KMP algorithm


  • 1
    L

    At first special thanks to @jianchao.li.fighter's introduction to KMP algorithm and 2 helpful links in his answer to this problem. Here is his answer: https://leetcode.com/discuss/38998/explained-4ms-easy-c-solution and below are the 2 links to refer to.

    1. http://jakeboxer.com/blog/2009/12/13/the-knuth-morris-pratt-algorithm-in-my-own-words/
    2. http://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/

    For those who don't know what KMP is please read the content in the 2 links first. And here I just add some detailed supplementary comments on how to build up the lps[] based on the original code and comments from the 2nd link. Hope this will help those who are still confused how the lps[] is built (especially for the tricky part) after read the content in the 2 links above.

    void computeLPSArray(char *pat, int M, int *lps)
      {
          int len = 0;  // lenght of the previous longest prefix suffix
          int i;
    
          lps[0] = 0; // lps[0] is always 0
          i = 1;
    
          // the loop calculates lps[i] for i = 1 to M-1
          while (i < M)
          {
             //example "abababca" and i==5, len==3. The longest prefix suffix is "aba", when pat[i]==pat[len],
             //we get new prefix "abab" and new suffix "abab", so increase length of  current lps by 1 and go to next iteration. 
             if (pat[i] == pat[len])
             {
               len++;
               lps[i] = len;
               i++;
             }
             else // (pat[i] != pat[len])
             {
               if (len != 0)
               {
                 len = lps[len-1];
                 //This is tricky. Consider the example "ababe......ababc", i is index of 'c', len==4. The longest prefix suffix is "abab",
                 //when pat[i]!=pat[len], we get new prefix "ababe" and suffix "ababc", which are not equal. 
                 //This means we can't increment length of lps based on current lps "abab" with len==4. We may want to increment it based on
                 //the longest prefix suffix with length < len==4, which by definition is lps of "abab". So we set len to lps[len-1],
                 //which is 2, now the lps is "ab". Then check pat[i]==pat[len] again due to the while loop, which is also the reason
                 //why we do not increment i here. The iteration of i terminate until len==0 (didn't find lps ends with pat[i]) or found
                 //a lps ends with pat[i].
               }
               else // if (len == 0)
               { // there isn't any lps ends with pat[i], so set lps[i] = 0 and go to next iteration.
                 lps[i] = 0;
                 i++;
               }
             }
          }
      }  
    

  • 1
    L

    Here is the whole Java solution using KMP algorithm.

    class Solution_implementKMP {
        public int strStr(String haystack, String needle) {
            int n = haystack.length(), k = needle.length();
            if(n<k) return -1;
            if(k==0) return 0;
            int[] lps = new int[k];
            buildPattern(lps,needle);
            int partialMatchLength = 0;
            for(int i=0;i<n;){
                if(haystack.charAt(i)==needle.charAt(partialMatchLength)){
                    if(partialMatchLength==k-1) return i-partialMatchLength;
                    partialMatchLength++;
                    i++;
                }
                else{
                    if(partialMatchLength>0) {
                        partialMatchLength = lps[partialMatchLength-1];
                    }
                    else i++;
                }
                
            }
            return -1;
        }
        
        public void buildPattern(int[] lps, String needle){
            lps[0] = 0;
            int i = 1, currLength = 0; //current length of longest prefix suffix
            while(i<lps.length){
                if(needle.charAt(i)==needle.charAt(currLength)){
                    currLength++;
                    lps[i] = currLength;
                    i++;
                }
                else{
                    if(currLength!=0) currLength = lps[currLength-1];
                    else{
                        lps[i] = 0;
                        i++;
                    }
                }
            }
        }
    }

  • 0
    J

    I think it is MP, but not KMP.
    KMP jump table is more complex

    void getNext(char* p, int m, int next[])
    {
    	int j = 0, k = -1;
    	next[0] = -1;
    	while(j < m) {
    		while(k > -1 && p[k] != p[j])
    			k = next[k];
    		j++;
    		k++;
    		if(p[k] == p[j])
    			next[j] = next[k];
    		else
    			next[j] = k;
    	}
    }

  • 0
    L

    Hi @j.c , thanks for your comment. Actually it was the first time I heard KMP algorithm, I didn't have any idea about MP and KMP. I will try to figure it out. :)


  • 0
    L

    I didn't see any inner difference, what's your point bro?


  • 0
    J

    KMP not only build jump table for prefix matching, but also optimize it to avoid recheck the same character when jump back. ex, c[n] is not match, you jump to check c[lps[n]], since c[lps[n]] == c[n], exactly not match, need jump again.


  • 0
    L

    Hi @j.c I finally got it. Special thanks to you!


  • 2
    A

    This video really helped to get the intuition behind LPS[ ] array computation - https://www.youtube.com/watch?v=tWDUjkMv6Lc&lc=z12bxp2htrrgfzxdy225hrgbqm2sj3a5s04


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.