C# - find GCD of char counts to reduce iteration bounds


  • 1

    The algorithm is the same as others, iterate over the possible lengths of the substring and test for repeating pattern. But you don't need to test all lengths between 1 and half the string length. If you first tally counts of the each character then find their greatest common denominator you can test a smaller range of lengths.

    Consider that the problem is to break the string into equal length segments that are each equal (just another way to phrase it). Each segment will have the same set of characters with the same counts. So, you must be able to divide the counts for each character evenly across all segments. The gcd becomes the maximum number of segments you could make. If you were to try and have more segments you would not be able to evenly distribute each bucket of characters.

    Secondly, any lesser amount of segments must divide evenly into the gcd. You can determine the substring length for your test by just taking the string length and dividing by the number of segments.

    Unfortunately, this did not actually produce an improve OJ time. But it is likely the overhead of calculating the gcd which would only save time for larger datasets. In any case I found it interesting.

        public bool RepeatedSubstringPattern(string str)
        {
            int gcd = GCD(str);
            if (gcd < 2) return false;
            
            for (int numGroups = 2; numGroups <= gcd; numGroups++)
            {
                if (gcd % numGroups == 0)
                {
                    if (IsRepeated(str, str.Length/numGroups)) return true;
                }
            }
            return false;
        }
    
        public bool IsRepeated(string str, int substrLen)
        {
            int instances = str.Length / substrLen;
            for (int i = 1; i < instances; i++)
            {
                for (int j = 0; j < substrLen; j++)
                {
                    if (str[j] != str[i * substrLen + j]) 
                    {
                        return false;
                    }
                }
            }
    
            return true;
        }
        
        public int GCD(string str)
        {
            int gcd = 0;
            int[] counts = new int[26];
            for (int i = 0; i < str.Length; i++)
            {
                int index = str[i] - 'a';
                counts[index]++;
                gcd = counts[index];
            }
            
            for (int i = 0; i < counts.Length; i++)
            {
                if (counts[i] > 0)
                {
                    gcd = GCD(gcd, counts[i]);
                }
            }
            return gcd;
        }
        
        public int GCD(int a, int b)
        {
            while (b != 0)
            {
               int t = b; 
               b = a % b; 
               a = t; 
            }
            return a;
        }
    

    Another idea is to find the min start length is by looking at the largest distance between characters and start and end as well as the counts of the characters. One pass to get worst distance and counts. The length can be no less than the worst distance and also no less than the total length divided by the lowest count. Take the larger of the 2. Using this as the starting point you can improve your result even more than the GCD.

        public bool RepeatedSubstringPattern(string str)
        {
            if (str == null || str.Length < 2) return false;
    
            int min = MinLen(str);
            for (int len = min; len <= str.Length / 2; len++)
            {
                if (str.Length % len == 0)
                {
                    if (IsRepeated(str, len)) return true;
                }
            }
            return false;
        }
    
        public bool IsRepeated(string str, int substrLen)
        {
            int instances = str.Length / substrLen;
            for (int i = 1; i < instances; i++)
            {
                for (int j = 0; j < substrLen; j++)
                {
                    if (str[j] != str[i * substrLen + j]) 
                    {
                        return false;
                    }
                }
            }
    
            return true;
        }
        
        public int MinLen(string str)
        {
            int[] mapDist = new int[26]; // use 1 based index
            int[] mapFreq = new int[26];
            int maxDist = 1;
            int x = 0;
            for (int i = 0; i < str.Length; i++)
            {
                x = str[i] - 'a';
    
                // distance from head or last occurence of this char
                int distance = i + 1 - mapDist[x];
                maxDist = distance > maxDist ? distance : maxDist;
                mapDist[x] = i + 1;
    
                // frequency
                mapFreq[x]++;
            }
    
            int minFreq = str.Length;
            for (int i = 1; i < mapFreq.Length; i++)
            {
                minFreq = (mapFreq[i] > 0 && mapFreq[i] < minFreq) ? mapFreq[i] : minFreq;
            }
    
            // check for greater max distance from last occurence to end of string
            for (int i = 1; i < mapDist.Length; i++)
            {
                maxDist = (mapDist[i] > 0 && str.Length - mapDist[i] > maxDist) ? str.Length - mapDist[i] : maxDist;
            }
    
            return Math.Max(str.Length/minFreq, maxDist);
        }
    
    

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.