# Python solution with detailed explanation

• Solution

Repeated DNA Sequences https://leetcode.com/problems/repeated-dna-sequences/?tab=Description

Brute Force

• The brute force solution is to use a rolling window and store every substring in a set and report a match. This can be inefficient in terms of memory usage - you store approximately 10 * N strings.

Karp Rabin Optimization

• The optimization should make use of reduced alphabet size and the fixed length of the strings. We use Karp-Rabin to accomplish this.
• There are 4 characters and this means we must use base 4.
• Compute the hash of first 10 characters - i.e. the base 4 representation of first 10 characters. How will you compute "123"? 1, 1 * 10 + 2 = 12, 12 * 10 + 3.
• While computing the hash for first number, compute 4^9
• Now as we roll the window, say the string was "1234"(base 10). Then hash for "234" = (123-100) * 10 + 4.
``````class Solution(object):
def findRepeatedDnaSequences(self, s):
"""
:type s: str
:rtype: List[str]
"""
if len(s) < 10:
return []
tags, N, base = {"A":0, "C":1, "G":2, "T":3}, 10, 4
cache, repeats = set([]), set([])
number, factor = 0, 1
for i in range(N):
number, factor = number*base + tags[s[i]], factor*base
factor /= base # Pre-compute 4^9