Given a string, find the longest repetitive substring.
Example:
input "banana", output "ana".
Explanation: "ana" is repeated twice.
input "ababab", output "abab".
Explanation: "abab" is repeated twice.
Isn't that implied? The longest repetitive substring cannot be substring of other repetitive substrings which are shorter in length. The longest repetitive substring, is however, guaranteed to be a substring of longer substrings (each of which will occur only once throughout the complete string).
A clear definition of the question is:
The Longest Repetitive Substring is a substring which occurs atleast twice in the original string and is longer than all other such substrings.
For more reading, Wikipedia has a good entry: https://en.wikipedia.org/wiki/Longest_repeated_substring_problem
I dind't find anything better than a prefix tree for this problem.
Follows a possible implementation:
#!python3
class Node:
def __init__(self):
self.nodes = [None] * 256
def insert(self, text):
node = self
result = 0
while node:
head, tail = text[0], text[1:]
next = node.nodes[head]
if next:
result += 1
node = next
else:
next = Node()
node.nodes[head] = next
if tail:
node = next
text = tail
else:
node = None
return result
def max_repeated_substring(text):
if not text:
return ''
text = text.encode()
root = Node()
bestCount = 0
best = ''
for i in range(len(text)):
count = root.insert(text[i:])
if count > bestCount:
best = text[i:i + count]
bestCount = count
return best
Excellent slides to understand how to tackle such problems:: https://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/suffixtrees.pdf