**Solution**

**Repeated DNA Sequences** https://leetcode.com/problems/repeated-dna-sequences/?tab=Description

**Brute Force**

- The brute force solution is to use a rolling window and store every substring in a set and report a match. This can be inefficient in terms of memory usage - you store approximately 10 * N strings.

**Karp Rabin Optimization**

- The optimization should make use of reduced alphabet size and the fixed length of the strings. We use Karp-Rabin to accomplish this.
- There are 4 characters and this means we must use base 4.
- Compute the hash of first 10 characters - i.e. the base 4 representation of first 10 characters. How will you compute "123"? 1, 1 * 10 + 2 = 12, 12 * 10 + 3.
- While computing the hash for first number, compute 4^9
- Now as we roll the window, say the string was "1234"(base 10). Then hash for "234" = (123-100) * 10 + 4.

```
class Solution(object):
def findRepeatedDnaSequences(self, s):
"""
:type s: str
:rtype: List[str]
"""
if len(s) < 10:
return []
tags, N, base = {"A":0, "C":1, "G":2, "T":3}, 10, 4
cache, repeats = set([]), set([])
number, factor = 0, 1
for i in range(N):
number, factor = number*base + tags[s[i]], factor*base
factor /= base # Pre-compute 4^9
cache.add(number)
for i in range(N, len(s)):
number = (number - tags[s[i-N]]*factor)*base + tags[s[i]]
if number in cache:
repeats.add(s[i-(N-1):i+1])
else:
cache.add(number)
return [x for x in repeats]
```