Help me solving this String based problem from amazon interview

  • 0

    Given a paragraph of text, write a program to find the first shortest sub-segment that contains each of the given k words at least once. A segment is said to be shorter than other if it contains less number of words.

    Ignore characters other than [a-z][A-Z] in the text. Comparison between the strings should be case-insensitive.

    If no sub-segment is found then the program should output “NO SUBSEGMENT FOUND”.

    Input format :

    First line of the input contains the text.
    Next line contains k , the number of words given to be searched.
    Each of the next k lines contains a word.

    Output format :

    Print first shortest sub-segment that contains given k words , ignore special characters, numbers.If no sub-segment is found it should return “NO SUBSEGMENT FOUND”

    Sample Input :

    This is a test. This is a programming test. This is a programming test in any language.

    Sample Output :

    a programming test This

    Explanation :
    In this test case segment "a programming test. This" contains given four words. You have to print without special characters, numbers so output is "a programming test This". Another segment "This is a programming test." also contains given four words but have more number of words.

    Constraint :

    Total number of character in a paragraph will not be more than 200,000.
    0 < k <= no. of words in paragraph.
    0 < Each word length < 15

  • 1

    A python solution using moving window. Needs further optimization.

    def scan(para, keys):
        words = para.lower().replace('.', '').split()
        keys_pos = {x: [] for x in keys}
        win_begin = 0
        win_end = 0
        min_seg = (-1, len(words))
        for i, word in enumerate(words):
            if word not in keys_pos:
            win_end = i
            pos = keys_pos[word]
            if len(pos) > 1 and pos[0] == win_begin:
                del pos[0]
                win_begin = min(x[0] for x in keys_pos.values() if len(x) > 0)
            if all(len(x) > 0 for x in keys_pos.values()):
                valid_seg = (win_begin, win_end)
                if valid_seg[1] - valid_seg[0] < min_seg[1] - min_seg[0]:
                    min_seg = valid_seg
        if min_seg[0] == -1:
            return None
        return ' '.join(words[min_seg[0]:min_seg[1]+1])
    if __name__ == '__main__':
        para = "This is a test. This is a programming test. This is a programming test in any language."
        keys = ["this", "a", "test", "programming"]
        print(scan(para, keys))

Log in to reply

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.