Solution by ananai


  • 0

    Approach #1 Brute Force [Accepted]

    Algorithm

    Start two pointers i and j from the beginning of a haystack and a needle respectively. Compare every element of the needle with every element of the haystack if there is a match keep moving pointers, otherwise, move the pointer of the needle to its head and eliminate all previous matches by moving the pointer of the haystack j steps back and adding 1 to start new comparison. In other words move i by i-j+1.

    C++

    int strStr(string haystack, string needle) {
    
            if(needle.length() == 0) {
                return 0;
            }
    
            if(haystack.length() == 0) {
                return -1;
            }
    
            int i = 0, j = 0;
    
            while(i<haystack.length()) {
                if(haystack[i] == needle[j]) {
                    i++;
                    j++;
                } else {
                    i = i - j + 1;
                    j = 0;
                }
                if(j == needle.length()) {
                    return i-j;
                }
            }
    
            return -1;
    }
    

    Complexity Analysis

    • Time complexity : $$O(nm)$$.

    Because in the worst case we compare every element of the needle with every element element of the haystack, the total time complexity is $$O(nm)$$, where n and m are lengths of the haystack and the needle respectively.

    • Space complexity : $$O(1)$$.

    Approach #2 Substring Match [Accepted]

    Algorithm

    Start the pointer i from the head of a haystack. Compare every substring of the haystack with length equal to the length of the needle. If there is a match return the index of the pointer i.

    C++

    int strStr(string haystack, string needle) {
    
            if(needle.length() == 0) {
                return 0;
            }
    
            if(haystack.length() == 0) {
                return -1;
            }
    
            if(haystack.length() < needle.length()) {
                return -1;
            }
    
            for(int i = 0; i<haystack.length() - needle.length() + 1; i++) {
                if(haystack.substr(i, needle.length()) == needle) {
                    return i;
                }
            }
    
            return -1;
    }
    

    Complexity Analysis

    • Time complexity : $$O(nm)$$.

    Iteration through the haystack takes n time and substr operation of STL takes m time, the total time complexity is $$O(nm)$$, where n and m are lengths of the haystack and the needle respectively.

    • Space complexity : $$O(1)$$.

    Approach #3 Knuth–Morris–Pratt (KMP) Algorithm [Accepted]

    Algorithm

    In contrast to naive algorithm KMP algorithm stores the history of previous matches of the haystack and the needle. For this reason lets run two pointers i and j starting from both heads of the haystack and the needle. If characters at these pointers match then move both pointers, otherwise, lets find a suffix which is also a prefix in the needle. This means that this suffix is also a substring right before the pointer i in the haystack. Consequently, if this suffix is also a prefix we can start our pointer j right after that prefix in the needle. In this approach we can avoiding moving the pointer i backwards.

    This means that we need to store the history of matches between a suffix and a prefix in the needle and what point to start the matching if there is a mismatch between the haystack and the needle.

    Preprocessing Longest Suffix Prefix

    In order to store information for the longest suffix prefix run two pointers from the first j and the second i characters of the needle respectively. If there is a match between characters store the length of the match, otherwise, leave it 0.

    Lets consider the following example

    • haystack aabdacabdab
    • needle abdab

    C++

    vector<int> longestSuffixPrefix(string needle) {
    
        vector<int> lsp(needle.length(), 0);
    
        int i = 1, j = 0;
    
        while(i<needle.length()) {
    
            if(needle[i] == needle[j]) {
                j++;
                lsp[i] = j;
                i++;
            } else {
    
                if(j != 0) {
                    j = lsp[j-1];
                } else {
                    i++;
                }
            }
        }
    
        return lsp;
    }
    
    Running KMP algorithm on the haystack and the needle

    C++

    int kmp(string haystack, string needle) {
    
        vector<int> lsp = longestSuffixPrefix(needle);
    
        int i = 0, j = 0;
    
        while(i<haystack.length()) {
    
            if(haystack[i] == needle[j]) {
                i++;
                j++;
            } else {
    
                if(j != 0) {
                    j = lsp[j-1];
                } else {
                    i++;
                }
            }
            if(j == needle.length()) {
                return i - j;
            }
        }
    
        return -1;
    }
    
    int strStr(string haystack, string needle) {
    
        if(needle.length() == 0) {
            return 0;
        }
    
        if(haystack.length() == 0) {
            return -1;
        }
    
        return kmp(haystack, needle);
    }
    

    Complexity Analysis

    • Time complexity : $$O(n+m)$$.

    Preprocessing of the longest suffix prefix array takes m time and the searching for the needle in the haystack takes n respectively, the total time complexity is $$O(n+m)$$, where n and m are lengths of the haystack and the needle respectively.

    • Space complexity : $$O(m)$$.

    We need m space to store information on the longest suffix prefix substring.


    Approach #4 Z Algorithm [Accepted]

    Algorithm

    Z Algorithm generates an array Z on a string S where every element of Z array Z[i] is the longest substring starting from a index i which is also a prefix of the string S.

    The idea is to concatenate the needle with the haystack using # character as a separation point. In this case the needle will be the prefix of a resulting string.

    needle#haystack

    • haystack aabdacabdab
    • needle abdab
    Running Z Algorithm on the haystack and the needle
    Lets construct Z array

    Lets maintain the interval [l, r] where a substring based on this interval is a prefix of the given string. The index i will point to the current position in the string.

    • if i>r then there is no substring which is also a prefix that starts before i and ends after i. Consequently, reset l and r and compute new interval [l, r] and get z[i] = r-l.

    • if i≤r then let k=i-l. z[i]≥min(z[k], r-i+1) because s[i..] matches s[k..] for at least r-i+1 characters.

      • if z[k]<r-i+1 means that we are in the substring that matches with the prefix of the given string. Instead of recalculating values for the current interval lets use values that were already computed.

      • if z[k]≥r-i+1 we are beyond the interval that was already processed. Consequently, compute z[i] by updating [l, r] and match from s[r+1]

    C++

    vector<int> zArray(string s) {
    
        int n = (int)s.length();
    
        vector<int> z(n, 0);
    
        int l = 0, r = 0, k;
    
        for(int i = 1; i<s.length(); i++) {
    
            if(i>r) {
    
                l = r = i;
    
                while(r<n && s[r-l] == s[r]) {
                    r++;
                }
    
                z[i] = r-l;
                r--;
    
            } else {
    
                k = i-l;
    
                if(z[k] < r-i+1) {
    
                    z[i] = z[k];
    
                } else {
    
                    l = i;
    
                    while(r<n && s[r-l] == s[r]) {
                        r++;
                    }
    
                    z[i] = r-l;
                    r--;
                }
            }
        }
    
        return z;
    }
    
    Search for the needle in the haystack
    int strStr(string haystack, string needle) {
    
        if(needle.length() == 0) {
            return 0;
        }
    
        if(haystack.length() == 0) {
            return -1;
        }
    
        vector<int> z = zArray(needle + "#" + haystack);
    
        for(int i = 0; i<z.size(); i++) {
            if(z[i] == needle.length()) {
                return i - needle.length() - 1;
            }
        }
    
        return -1;
    }
    

    Complexity Analysis

    • Time complexity : $$O(n+m)$$.

    The time complexity is $$O(n+m)$$ because characters at positions less than r are never compared and if a match is found we increase r by one, the total time complexity is $$O(n+m)$$, where n and m are lengths of the haystack and the needle respectively.

    • Space complexity : $$O(m+n)$$.

    We need m+n space to store information on the Z array.


    Approach #5 Rabin-Karp Algorithm [Accepted]

    Algorithm

    The idea is based on the Rabin-Karp algorithm to hash the string. Calculate the hash value for the string needle and then calculate hash values for every prefix substring of the string haystack. Iterate through hashed values of the haystack and compare all substring of length m the needle length in constant time.

    Calculate hash using

    haystackHash[i] = (haystack[i] - 'a' + 1) * pows[i]
    

    where pows are degress of hash.

    int rabinKarp(string haystack, string needle) {
    
        const int p = 31;
    
        vector<long long> pows(haystack.length());
    
        pows[0] = 1;
    
        for(int i=1; i<pows.size(); i++) {
          pows[i] = pows[i-1] * p;
        }
    
        vector<long long> haystackHash(haystack.length());
    
        for (int i=0; i<haystack.length(); ++i)
        {
            haystackHash[i] = (haystack[i] - 'a' + 1) * pows[i];
            if (i>0) {
                haystackHash[i] += haystackHash[i-1];
            }
        }
    
        long long needleHash = 0;
    
        for (int i=0; i<needle.length(); i++) {
          needleHash += (needle[i] - 'a' + 1) * pows[i];
        }
    
        for (int i = 0; i + needle.length() - 1 < haystack.length(); i++)
        {
            long long currentHash = haystackHash[i+needle.length()-1];
    
            if (i>0){
                currentHash -= haystackHash[i-1];
            }  
    
            if (currentHash == needleHash * pows[i])
                return i;
        }
    
        return -1;
    }
    
    Search for the needle in the haystack using hashed values
    int strStr(string haystack, string needle) {
    
        if(needle.length() == 0) {
            return 0;
        }
    
        if(haystack.length() == 0) {
            return -1;
        }
    
        if(haystack.length() < needle.length()) {
            return -1;
        }
    
        return rabinKarp(haystack, needle);
    }
    

    Complexity Analysis

    • Time complexity : $$O(n+m)$$.

    The average and the best time complexity of Rabin-Karp Algorithm is $$O(m+n)$$, but in the worst case it becomes $$O(mn)$$. The worst case occurs when all characters of the haystack are the same with all characters of the needle. But the problem asks for the first match so the time complexity will be $$O(m+n)$$, where n and m are lengths of the haystack and the needle respectively.

    • Space complexity : $$O(n)$$.

    We need n space to store hashes of the haystack string.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.