Longest substring of distinct characters in a stream


  • 0
    I

    Return length of the Longest substring of distinct characters from a stream of character.
    The stream can be very long and saving the whole stream in memory is not an option.

    //following method is used to receive new char from the stream
    put(char c)
    

    This is same as the https://leetcode.com/problems/longest-substring-with-at-most-k-distinct-characters/
    but the stream makes it a little complicated


  • 0

    I think with a hash-table and a pointer which records the current start point will apply to this problem.


  • 0

    @xidui I have a similar idea, using sliding (rolling) window and hash for each character.


  • 0
    Z

    @xidui Can you elaborate your algorithm? Thanks!


  • 0
    G
    This post is deleted!

  • 0
    G
    This post is deleted!

  • 1

    @ZhackerZ Here is my algorithm
    I keep in map the last position of each dinstict char in stream. When there is a dublicate char ch, remove from map all chars with positions earlier or equal than last position of ch and reindex the other characters.
    Example :
    a b c d e f c d k l m n o p f
    0 1 2 3 4 5
    x x x 0 1 2 3
    x x x x 0 1 2 3 4 5 6 7 8 9
    x x x x x x 0 1 2 3 4 5 6 7 8 9,
    where x - removed character

    1. a arrives , maxLenght = 1, map = { a= 0}
    2. b arrives maxLenght = 2, map = { a= 0, b = 1}
    3. c arrives maxLenght = 3, map = { a= 0, b = 1, c= 2}
    4. d arrives maxLenght = 4, map = { a= 0, b = 1, c= 2, d = 3}
    5. e arrives maxLenght = 5 map = { a= 0, b = 1, c= 2, d = 3, e = 4}
    6. f arrives maxLenght = 6 map = { a= 0, b = 1, c= 2, d = 3, e= 4, f = 5}
    7. c arrives maxLenght = 6 map = {c= 3, d = 0, e= 1, f = 2}
    	 class LongestDistinctSubsequence {
    		 
    		 private int maxLength;			 
    		 private int pos;
    		 private Map<Character,Integer> freq;
    		 
    		 public LongestDistinctSubsequence()  {
    			 freq  =  new HashMap<>();
    		 }
    		 
    		 public void put(char ch)   {
    			 
    			 if  (!freq.containsKey(ch))   {					 
    				 freq.put(ch,  pos);	
    				 pos++;
    				 maxLength  =  Math.max(pos,  maxLength) ;
    			
    			 }
    			 else {
    				 int chPos =  freq.get(ch);					 
    				 Iterator<Entry<Character,Integer>> it  =  freq.entrySet().iterator();
    				 while (it.hasNext())  {
    					 Entry<Character,Integer> entry  =  it.next();						
    					 if  (entry.getValue()  <=  chPos)   {
    						 it.remove();
    					 }
    					 else  {
    						 freq.put(entry.getKey(),  freq.get(entry.getKey())  -  chPos  -  1);	
    					 }						
    				 }					 
    				 freq.put(ch,  pos  -  chPos  -  1);	
    				 pos  =  pos  -  chPos;					 				 
    				 
    			 }
    			
    		 }
    		 
    		 public int getMaxLength() {
    			 return maxLength;
    		 }
    	 } 
    

  • 0
    Z
    This post is deleted!

  • 0

    @ZhackerZ isn't your question the same as https://leetcode.com/problems/longest-substring-with-at-most-k-distinct-characters. There is a discuss abou it. Check in discuss


  • 0
    Z
    This post is deleted!

  • 0

    I think this problem is different from the LC problem since it has a memory limit to process the streaming data.
    So, the idea is to use sequential of numbers to index each char. Besides, we only need to return the length so that the data is not actually stored in the buffer. Only the indexes of the last appearance of every character are saved in an int[256] array.
    As a result, this implementation can be considered near O(1) if the BUFFER_SIZE is big enough.
    In other words, we only re-index the int[256] when the buffer is full, which is O(256).
    The function put() is considered O(1) most of the time.

    Here is my code:

    public class LongestDistinctCharacters {
    	private int BUFFER_SIZE = 300;
    	private int head;
    	private int tail;
    	private int max_len;
    	private int[] map;
    	LongestDistinctCharacters() {
    		map = new int[256];
    		Arrays.fill(map, -1);
    		head = 0;
    		tail = 0;
    		max_len = 0;
    	}
    	
    	private void shift() {
            // set the head index starting from 0;		
    		tail -= head;
    		for (int i = 0; i<256; i++) {
    			map[i] -= head;
    			map[i] = Math.max(map[i], -1); 
    		}
    		head = 0;
    	}
    	void put(char c){
    		int idx = map[c];
    		if ( (idx >= 0) && (idx >= head) && (idx < tail) ) {
    			// this char is in the range of current subsequence
    			max_len = Math.max(max_len, tail - head);
    			head = idx+1;
    		}
    		map[c]=tail++;
    		if ( tail >= BUFFER_SIZE) {
    			shift();
    		}
    	}
    	
    	int getMaxLen() {
    		return max_len;
    	}
    
    	public static void main(String[] args) {
    		LongestDistinctCharacters sol = new LongestDistinctCharacters();
    		
    		String input = "returnlengthoftheLongestsubsequenceofdistinctcharacters"
    		+"returnlengthoftheLongestsubsequenceofdistinctcharacters"
    		+"returnlengthoftheLongestsubsequenceofdistinctcharacters"
    		+"returnlengthoftheLongestsubsequenceofdistinctcharacters"
    		+"returnlengthoftheLongestsubsequenceofdistinctcharacters"
    		+"returnlengthoftheLongestsubsequenceofdistinctcharacters"
    		+"returnlengthoftheLongestsubsequenceofdistinctcharacters"
    		+"returnlengthoftheLongestsubsequenceofdistinctcharacters"
    		+"returnlengthoftheLongestsubsequenceofdistinctcharacters"
    		+"returnlengthoftheabcdefghijklmnopqrsxyzasdsddsdaddadaa";
    		
    		//String input = "abcdbdedeabcdefbghij";
    		//String input = "aaaaaaaaaaaabbbbbbabbbbbbbbbb";
    		for (char c: input.toCharArray()){
    			sol.put(c);
    		}
    		System.out.println("Max length =" + sol.getMaxLen() );
    		System.out.println(input);
    	}
    
    }
    

  • 0
    D

    I have a question here. The problem description has longest "SUBSEQUENCE" and not "SUBSTRING". The solution given by @elmirap gives us the longest substring. Is it necessary that all characters in a subsequence should be contiguous in the original string? For example, assume the LCS problem from CLRS. We do consider subsequences where characters are non-contiguous in the original String.

    In this case wouldn't the best solution be to just get all distinct characters in the stream and be done with it? Have something like a HashSet. If a new guy comes, insert, else do nothing.


  • 0

    It seems that the algorithm search for substring, there is a comment

    This is same as the https://leetcode.com/problems/longest-substring-with-at-most-k-distinct-characters/
    but the stream makes it a little complicated


  • 0

    @elmirap Thanks, I have updated the title and description to prevent confusion.


  • 0

  • 0
    R

    Very easy solution:

    #!python3
    cur = ´´
    acc = 0
    best = 0
    for next in text:
       flag = -int(cur != next)    # can be -1 (aka ~0) or 0 
       acc = (acc + 1) & flag    # increments or reset acc
       best = max(best, acc)    # keeps the best result found yet
       cur = next                      # ready for next iteration
    return best
    

    Of course this loop can be converted into a state machine with a class that evolves at each call to method put but the essence of this solution does not
    change.

    This trick to use bool -> int conversion and then use the -1, 0 value to increment or reset is very convenient with C++ or Java because they are optimized very well by compilers.


  • 0
    M

    This can actually be done using a heap(min or max, doesn't matter). The heap will have a fixed size. Whenever a new character comes in, we'll check in the heap if that character exists or not. If it exists then remove that character and update the maximum length of the string with distinct integers. if it doesn't then add it to the heap.
    Total complexity-O(nlgn)


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.