Google onsite interview questions

  • 35

    I recently went through a Google interview and wanted to contribute the questions I faced.
    Note: Google has apparently changed its interview slate, there weren't any system design interviews and all are coding questions. Might have to verify this clearly with your recruiters.

    1. Given a contiguous sequence of numbers in which each number repeats thrice, there is exactly one missing number. Find the missing number.
      eg: 11122333 : Missing number 2
      11122233344455666 Missing number 5

    2. Given a compressed string in which a number followed by [] indicate how many times those characters occur, decompress the string
      Eg. : a3[b2[c1[d]]]e will be decompressed as abcdcdbcdcdbcdcde.
      Assume the string is well formed and number will always be followed by a [].

    3. Given a tree representation of a html parsed output, wherein every block is a node in the tree, find if two html docs contain the same text.

    struct Node {
           string value;
           bool isMetadata;
           vector<Node*> children;

    For eg, consider the two documents


    Hello world

    </body></html> will be represented as Node1: value sample, children: <body> isMetadata: true Node2: value: <body> children:

    isMetadata: true Node3: value:

    : children: Hello world isMetadata: true Node4: value Hello world isMetadata: false

    and a second document

    <html><body>Hello world</body></html>

    and both documents are equivalent since they contain the same data.

    Note: the case of both documents fitting in memory is trivial, since it is just walking this tree list, consolidating data and comparing. As a follow up, solve the case where the whole documents may not be able to fit in memory.

    1. Given a 2D matrix M X N, support two operations:
      Query(row1, col1, row2, col2) such that I get the sum of all numbers in the rectangle ((row1, col1), (row1, col2), (row2, col1), (row2, col2)) and
      Update(row, col) to a new number

    And query is a very frequent operation and update is a rare operation, so query should be really fast, but update can be slower.

    Follow up: How would you solve this in a distributed fashion

    1. Verify if a given matrix is a Toeplitz matrix:
      Follow up, assume that the whole matrix cannot be fit in memory and should be read from a file, assume that a few rows and all columns can be read in, how to verify?

  • 3

    Solution for 4:
    Preprocess the matrix m[i][j] and have a second matrix wherein:

    sum[i][j] = sum[i][j-1] + sum[i-1][j] - sum[i-1][j-1] + m[i][j] whenever i-1, j -1 are valid.

    Query(row1, col1, row2, col2) =
    Consider m[i][j]

    1   2   3
    4   5   6
    7   8   9

    Then sum[i][j]

    1      3      6
    5     12     21
    12   27     45

    Now Query(1,1,2,2) = sum[2][2] - sum[0][2] - sum[2][0] + m[0][0] = 28
    Query(1,1,2,1) = sum[2][1] - sum[2][0] - sum[0][1] + m[0][0]= 13
    Query(2,1,2,1) = sum[2][1] - sum[1][1] - sum[2][0] + m[1][0] = 8

    Thus we can see a general algorithm for Query:

    Assuming row2 >= row1 && col2 >= col1
    Query(row1, col1, row2, col2) = sum[row2][col2] -sum[row1-1][col1] - sum[row1][col1 - 1] + m[row1 - 1][col1 - 1], whenever row1 -1 and col1 - 1 are valid.
    Thus query is O(1) operation

    Similarly, update(x, y, val) will have to update the sum matrix as follows:
    Let diff = m[x][y] - val
    Apply diff to sum[i][j] = sum[i][j] + diff where i ranges from x to M - 1(M being number of rows in m), j ranges from y to N - 1(N being number of columns in N)
    Update is O(MxN) order.

  • 1

    here Query(1,1,2,2) should be sum[2][2] - sum[0][2] - sum[2][0] + sum[0][0]
    although m[0][0]=sum[0][0]

  • 1

    @varun44 -- were you a new grad? Perhaps that's why you didn't get any system design interviews...

  • 1

    agreed! But even new grads may be asked simple system design questions.

  • 3

    @nidhirastogi @heidixia I am not a new grad, I have a good number of years of experience and a background in distributed systems, so I was very much expecting those kinds of questions. I was surprised as well when I was informed by the recruiter just before the day of the interview that google's "interviewing slate" whatever that may be, has changed and they weren't going to ask me any system design questions.

    I have edited my post to highlight making sure of this with the recruiters so that this wasn't a one off instance.

    Note: This interview was at the San Bruno Youtube office.

  • 0

    @varun44 wow! Thanks for sharing this! Good luck.

  • 0

    @varun44 I wrote this code for your question number 2

    def decom(mystr):
        mystack = []
        for i in mystr:
            if i == "]":
                word_list = []
                while True:
                    value = mystack.pop()
                    if value != "[" :
                        word_list.insert(0, value)
                counter = int(mystack.pop())
                combined_list = word_list
                while counter > 1:
                    combined_list = combined_list + word_list
                    counter = counter - 1
        print "".join(mystack)

  • 1

    @varun44 Can you explain the 3rd question little more?

  • 0

    My stab at the 1st question. Binary division / Divide and Conquer

    def tripletBinary(s):
        if not s: return None
        i, j = 0, len(s)-1
        while i < j-1:
            mid = (i+j)/2
            inI = inJ = mid
            while inI >= 0 and s[inI] == s[mid]:
                inI -= 1
            while inJ < len(s) and s[inJ] == s[mid]:
                inJ += 1
            if inJ - inI == 3: return s[mid]
            if inI > 0 and (inI + 1) % 3 != 0:
                j = inI
                i = inJ
        return s[i]

  • 1

    @nidhi32 How a3[b2[c1[d]]]e will be decompressed to abcdcdbcdcdbcdcde ? I am not able to understand the decompression technique. Could you explain a little bit.

  • 0

    My Stab at Q 2. Could easily modify to handle 2 or more digit multiples.

    def decomp(s):
        if not s: return ""
        mult, seq, curr = [], [], []
        for i in range(len(s)):
            if s[i].isdigit():
                curr = []
            elif s[i] == ']':
                if not mult or not seq: return "ERROR: Invalid Sequence"
                temp, m = seq.pop(), mult.pop()
                seed = ''.join(curr)
                curr = temp + [seed for _ in range(m)]
            elif s[i].isalpha(): curr.append(s[i])
        if curr: seq.append(curr)
        return ''.join([''.join(s) for s in seq])

  • 1


    a digit followed by [] means you duplicate whatever's in the [] by digit. Thus:
    a3[b2[c1[d]]]e -> a3[b2[cd]]e -> a3[bcdcd]e -> abcdcdbcdcdbcdcde

  • 0

    Regarding question 5, is an interviee expected to know what a Toeplitz matrix is, or is it perfectly okay to ask?

  • 0

    @pisskidney It is always encouraged to clarify and ask concepts and doubts you have. I did not know what a Toeplitz was until the interviewer explained it to me. It can be reasonable to expect that there will be very vague questions in which the candidate is expected to clarify every detail before solving.

  • 0

    Solution for 2:

    #include <iostream>
    #include <string>
    #include <locale>
    using namespace std;
    string Decompress(string compressedString)
        size_t pos1 = 0;
        size_t pos2 = compressedString.find_last_of(']');      // point to right square bracket
        if (string::npos == pos2)
            return compressedString;
        locale loc;
        string decompressedString;
        char c = compressedString[pos1];
        while (! isdigit(c, loc)) {
            c = compressedString[pos1];
        string loopCountString;
        while (isdigit(c, loc)) {
            c = compressedString[pos1];
        // pos1 point to left square bracket
        int loopCount = stoi(loopCountString);
        string decompressedSubString = Decompress(compressedString.substr(pos1 + 1, pos2 - pos1 - 1));
        for (int i=0; i<loopCount; i++) {
        if (pos2 + 1 < compressedString.length())
            decompressedString.append(compressedString.substr(pos2 + 1));
        return decompressedString;
    int main(int argc, const char * argv[]) {
        string cpTextString = "a3[b2[c1[d]]]e";
        cout << Decompress(cpTextString) << endl;
        return 0;

  • 1

    @varun44 Nice man, just wondering. Did you end up getting the job?

  • 3

    I am assuming the question is: we take some L and R, then we write L, L, L, L+1, L+1, L+1, ...., R, R, R; then we delete one of these numbers and concatenate all of it together into S. We get S and we want to know which number was removed.

    Fundamentally, to get a divide and conquer solution, we need to build some expectation of what the string is at some position. If we haven't removed a number on the left half, then when we check the middle, it will match our expectation, and if we have removed a number, it won't. This allows us to divide and conquer.

    Building that expectation in small cases is easy. Building it in a general case is extremely hard if you do not have a bound on L: when L is unbounded, it is impossible to distinguish if you are outside the length of L or not. For example, say your string is 4747 4747 4747 4748 4748 4748 4749 ... and you stopped looking. You could be in L = 4747, but you could also be in L = 4747 4747 4747 4748 4748 4748 4749.

    When L and R are reasonably bounded, then we can find them relatively quickly by brute force: try the first K digits to be L, and check that we have either (L)(L)(L+1)(L+1)(L+1) or (L)(L)(L)(L+1)(L+1), where this pattern extends enough that together with the bound on L we are sure. (The R case is similar.)

    After, we can build a divide and conquer. Suppose we are looking at the interval S[i]...S[j] and it is supposed to represent L, R. Let's see where M = (L+R)//2 is supposed to start assuming the representation of [L, M-1] is untouched. We need to know how many digits are used to write L, L+1, ..., M-1 3 times. Let F(x) be the number of digits used to write 1, 2,..., x: then we want 3*(F(M-1) - F(L-1)), and figuring out F is a classic problem. Afterwards, we can look from S[i + F(x)] on: if it starts with (M)(M)(M+1) [and M-1 is behind us], then we were missing M; if it starts with (M)(M)(M), then we can consider i += F(x) + len( (M)(M)(M) ), and L = M+1; otherwise we can consider j = i + F(x) and R = M-1.


    Traverse the tree at the same time. When both trees won't fit in memory, use a merkle hashing technique and compare the hashes of the root. You can find info on that approach here:

    Standard technique. Let S[r][c] be the sum of the subrectangle from the top left corner to the bottom right corner of (r, c). Then our query is simply query(r1, c1, r2, c2) = S[r2][c2] - S[r1-1][c2] - S[r2][c1-1] + S[r1-1][c1-1]. We can build S slowly by using S[r][c] = S[r-1][c] + S[r][c-1] - S[r-1][c-1].

    Every entry (r, c) maps to it's diagonal c-r. For example:

    group = {}
    for r, row in enumerate(A):
      for c, val in enumerate(row):
        g = c - r
        if g not in group:
          group[g] = val
        elif group[g] != val:
          return False
    return True

  • 1

    #1 seems like a really easy bitwise operation problem

    class Solution:
        def findNonRep(self, nums):
            bitCntr = [0]*32
            for num in nums:
                for i in range(0, 32):
                    bitVal = (num >> i) & 1
                    if bitVal:
                        bitCntr[i] += 1
            res = 0
            pow = 1
            for i in range(0, 32):
                if bitCntr[i]%3:
                    res += 2**i
            return res

  • 4

    Just curious, were you offered the job? Thanks for sharing the information. I found it very valuable.

Log in to reply

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.