# Delete duplicates from string and also do lexicographical order

• Given a string which only contains lowercase. You need delete the repeated letters only leave one, and try to make the lexicographical order of new string is smallest.

i.e:

bcabc

You need delete 1 'b' and 1 'c', so you delete the first 'b' and first 'c', the new string will be abc which is smallest.

If you try to use greedy algorithm to solve this problem, you must sure that you could pass this case:

• I can think of a O(n^2) solution.

//remove duplicates
string remove_dup_lexi(string str) {
if (str == "") return str;
int len = str.length();
vector<bool> is_last(len, true);
bool visited[26]={0};
for (int i=str.length()-1; i>=0; i--) {
if (visited[str[i]-'a']) is_last[i] = false;
visited[str[i]-'a'] = true;
}
bool ok[26]={0};
string result;
for (int i=0; i<str.length(); i++) {
if (ok[str[i]-'a']) continue;
bool has_better = false;
for (int j=i; j<str.length(); j++) {
if (ok[str[j]-'a']) continue;
if (str[j] >= str[i] && is_last[j]) break;
if (str[j] < str[i]) {
has_better = true;
break;
}
}
if (!has_better) {
result += str[i];
ok[str[i]-'a'] = true;
}
}
return result;
}

• Could you elaborate what do you mean by "make the lexicographical order of new string is smallest" ? with any use case

• @gotham I think the example is quite clear. There are many ways to delete duplicates, but the left string should be with the smallest lexicographical ordering during all possible results.

• This post is deleted!

• Wrote a backtracking solution to the problem by keeping track of the duplicates in the string and trying out all possible combinations of selections for the duplicates. I was timing myself, so for now I have all the outcomes stored in an ascending order priority queue so that the smallest lexicographical string is always on top. That way, I just return the top of the queue out of all possible results. Currently trying to see if I could possibly cache results to make it faster.

#include <iostream>
#include <vector>
#include <map>
#include <queue>
using namespace std;

string allCombos(string s);
void comboHelper(priority_queue<string, vector<string>, greater<string>> &sorter, string res, string s, map<char, int> dupsChecker, map<char, int> visited);

int main(){
string s = "cbacdcbc";
string res = allCombos(s);
cout<<res<<endl;
return 0;
}

string allCombos(string s){
map<char, int> dupsChecker, visited;
for(int i = 0; i<(int)s.length(); i++){
dupsChecker[s[i]]++;
}
priority_queue<string, vector<string>, greater<string>> sorter;
comboHelper(sorter,"",s,dupsChecker,visited);
return sorter.top();
}

void comboHelper(priority_queue<string, vector<string>, greater<string>> &sorter, string res, string s, map<char, int> dupsChecker, map<char, int> visited){
if((int)s.length() == 0){ //end of string
sorter.push(res);
return;
}

if(dupsChecker[s[0]] <= 1){ //current character doesn't have dups
string temp = s;
s = s.substr(1, s.length()-1);
comboHelper(sorter,res+temp[0], s, dupsChecker, visited);
return;
}
else{
if(visited[s[0]]){//we've already chosen one out of the current char's dups, so we remove the rest
dupsChecker[s[0]]--;
string temp = s;
s = s.substr(1, s.length()-1);
comboHelper(sorter,res, s, dupsChecker, visited);
}
else{// else alternate between choosing one and letting it go
dupsChecker[s[0]]--;
string temp = s;
s = s.substr(1, s.length()-1);
comboHelper(sorter,res, s, dupsChecker, visited); //letting it go
dupsChecker[temp[0]]++;
visited[temp[0]]++;
comboHelper(sorter,res+temp[0], s, dupsChecker, visited); //choosing it
}
}
}

• This post is deleted!

• string str = "santiscowgl";
sort(str.begin(), str.end());
str.erase(unique(str.begin(), str.end()), str.end());
cout<<str<<endl;

that is ok

• use set to store the character, and then output the set in ascending order, done.

• @Evan100 I guess the problem was not presented clearly in the first post. The problem is to remove duplicated characters from a given string such that the remaining string is smallest possible in lexicographical order, which means you cannot rearrange the order of the given string, just remove duplicated characters.
For example, for string s = "bba", after removing one instance of duplicated character b, the remaining string is "ba" and you cannot rearrange it to be "ab".

• @Santisco The problem is to ask for the smallest possible remaining string in lexicographical order after removing duplicated characters, so you cannot rearrange the relative order of characters in the given string. The call of sort will destroy the original order of characters in the string.
For example, for given string "bba", the answer is "ba" (not "ab").

• This is an o(n) solution. Basically, the idea is as follows. The first letter you want to add is the 'a'. You can do that if all letters before the first occurrence of 'a' are duplicated and they have one duplicate after the position of 'a'. To do so, keep track of the positions of the duplicates of all letters and traverse the array again discarding letters and adding letters from 'a' to 'z' in their positions (if they are not already added).

def find_min_lexi(s):
dups = collections.defaultdict(collections.deque)
for i in range(len(s)):
dups[s[i]].append(i)
out = []
j = 0
letter = 'a'
while j < len(s) and letter <= 'z':
if len(dups[letter]) == 0:
letter = chr(ord(letter)+1)
continue
pos = dups[letter][0]
while j < pos:
if len(dups[s[j]]) == 1:
out.append(s[j])
dups[s[j]] = []
elif len(dups[s[j]]) > 1:
dups[s[j]].popleft()
j += 1
out.append(letter)
j += 1
dups[letter] = []
return "".join(out)

print(find_min_lexico(s))

• My greedy algorithm in Python:

def clean_up_text(text):

N = len(text)
found = dict()
valid = [True] * N

for i in range(N):
current = text[i]
try:
j, is_good = found[current]
if is_good:
valid[i] = False
continue
else:
valid[j] = False
except KeyError:
pass
if i + 1 < N:
found[current] = (i, current < text[i + 1])

return “”.join(text[i]
for i in range(N)
if valid[i])

• The first thing that comes to my mind when I see "find the best" is a recursive type solution.
let "RES", at any point contain the best string that doesn't have a duplicate and is the best solution for the input string till that point.
if a new character needs to be accommodated in res,

1. If its not present in our best string then we can just add it
2. otherwise, we have to choose between the incoming character and the character which is already present. (make a choice at this point as to which will yield a better solution and remove , add the character at corresponding positions)

to do 2,
we need to look up the position of a character in res (random access)
we need to find the next element in res
we need to remove an element in res
we need to add an element in res at the end

A DS that i can think of to do this is a linkedlist + hash table (for random access).
'''

def dupsDel(string):

class node:
def __init__(self,l,r):
self.left, self.right = l, r

start = 0
for i,each in enumerate(string):
lastitem = each
else:
#get the already existing item that is equal in value to the current None
#removing this will cause the rank to decrease :)
#connect prevguy and next guy

lastitem = each

res = ""
while next!=None:
res +=next
return res

'''

• @pbg did you try the test case "bcba"? The output would be "cba" if I did not misunderstand your solution. But actually the output should be "bca"?

• My O(n) C++ solution. Just try to find 'a' and verify that all characters behind have a duplicate after, and erase them, just after the ones that you already validated. Go on until 'z'. Then just delete remaining duplicates.

string remove_dup(string s) {
vector<int> letters;
vector<int> beforeL;
int p = 0;
for (int i = 0; i < 26; i++) {
bool notR = true;
int curr_p = (int)s.find('a' + i);
if(curr_p == string::npos) continue;
beforeL.clear();
letters.clear();
letters.resize(26);
for(int j = p; j < curr_p; j++)
beforeL.push_back(s[j] - 'a');
for(auto k = curr_p + 1; k < s.size(); k++)
letters[s[k] - 'a']++;
for(auto j = 0; j < beforeL.size(); j++) {
if(letters[beforeL[j]] == 0) {
notR = false;
break;
}
}
if(!notR) continue;
s = s.erase(p, curr_p - p);
p++;
}
for(int i = p; i < s.size(); i++) {
int curr_p = (int)s.find(s[i]);
if(curr_p < i) {
s.erase(i, 1);
i--;
}
}
return s;
}

• @AlexHenkel , @pbg Nice idea. proof that it gives the best solution ?

• This solution below requires 2 passes to the array.
First pass updates the count of the characters in a dictionary and also makes a note of lexicographically lowest character in the string.
Second pass identifies the start of the output string and removes the duplicate characters before it. Starts the iteration again from the start of the output string and appends character to the result only if the next character is lexicographically larger than the current character or if the count of the current character is 1

from collections import defaultdict, deque

def delete_duplicates_from_string_lexographical(string):
"""delete the duplicate characters from the string and output the lexicographically
smallest possible result of all possible results
ex: input: cbacdcbc | output: acdb
ex: input: bbbcaba | output: cab"""
dict_char_count = defaultdict(int)
low = s[0]

# iterate the string to keep the count of the characters in the dict
# also update 'low' to have the smallest character in string
for c in string:
dict_char_count[c] += 1
if c < low:
low = c

# remove all the duplicate characters before start of the output string
# start of the output string is either the lexicographically least character
# or the character whose occurrence count is 1
# this iteration stops after identifying the start of output string
start = 0
for c in string:
if c is low or dict_char_count[c] == 1:
break
dict_char_count[c] -= 1
start += 1

# 'res' stores the output string (result)
# 'is_added' is an array that stores if the character is added to the the result
res = list()

# iterate the string from the 'start' of the output string after removing duplicates before it
# if the character is not added to the result, add it only if the next character is lexicographically
# larger than the current character or if the count of the current character is 1
for i in range(start, len(string)):
c = string[i]
if dict_char_count[c] == 1 or c < string[i+1]:
res.append(c)
dict_char_count[c] -= 1

return ''.join(res)

s = "cbacdcbc"
#s = "bbbcaba"
print(delete_duplicates_from_string_lexographical(s))

• @sankeerth456 Can't find a better solution than brutal.

• Javascript, O(n) solution.
My approach is very naive.

1. Count all element
2. Find min/max character
3. Delete [begin,min) iff occurrence(element)>1
4. Delete [max, end) iff occurrence(element)>1
5. Delete [min, max) iff occurrence(element)>1
const ALPHABET = "abcdefghijklmnopqrstuvwxyz";
const deleteDup = (s)=> {
var ary = s.split("")
, d = ary.reduce((d,c)=> {d[c]=(d[c]||0)+1;return d;}, {})
, min = ary.reduce((min,c)=> Math.min(min, ALPHABET.indexOf(c)), 99)
, max = ary.reduce((max,c)=> Math.max(max, ALPHABET.indexOf(c)), -1)
, p = s.indexOf(ALPHABET[min])
, q = s.lastIndexOf(ALPHABET[max])
, clearf=(i)=>{
if(d[ary[i]]>1) {
d[ary[i]]--;
ary[i]=null;
}
}
;

d[ary[p]]=Number.MAX_VALUE;
d[ary[q]]=Number.MAX_VALUE;

for(let i=p-1;i>=0;--i) clearf(i);
for(let i=q+1;i<ary.length;++i) clearf(i);
for(let i=p+1;i<q;++i) clearf(i);

return ary.filter(c=>!!c).join("");
};

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.