An implementaion of Inverted Index


  • 0
    J

    I know there are a lot of perfect implementation here with hash table or trie. I just want to try it with Inverted Index, which is the real fulltext search solution used by most search engine.

    https://en.wikipedia.org/wiki/Inverted_index

    We give each word a unique id when they are added. Then index the id with the char in the word and the positions of the chars in word. When there comes an search, we find the indexed ids with corresponding char and char position. Thus we got several id sets. Then we have to find out the intersection of the sets.

    There are some annoying cases with "all dot" search. Maybe the length of the word should also be indexed.

    Just for another way of thought.

    public class WordDictionary {
        ArrayList<String> l;
    	ArrayList<HashMap<Character, ArrayList<Integer>>> index;
    	Set<Integer> s;
        /** Initialize your data structure here. */
        public WordDictionary() {
            index = new ArrayList<HashMap<Character, ArrayList<Integer>>>();
            l = new ArrayList<String>();
            s = new HashSet<Integer>();
        }
        
        /** Adds a word into the data structure. */
        public void addWord(String word) {
            l.add(word);
            s.add(word.length());
            while(index.size()<word.length()) index.add(new HashMap<Character, ArrayList<Integer>>());
            
            for(int i=0; i<word.length(); i++){
                HashMap<Character, ArrayList<Integer>> hm = index.get(i);
                if(!hm.containsKey(word.charAt(i))) hm.put(word.charAt(i), new ArrayList<Integer>());
                hm.get(word.charAt(i)).add(l.size()-1);
            }
        }
        
        /** Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter. */
        public boolean search(String word) {
            if(word.length()>index.size()) return false;
            
            boolean allDot = true;
            ArrayList<ArrayList<Integer>> comp = new ArrayList<ArrayList<Integer>>();
    
            for(int i=0; i<word.length();i++){
                char c = word.charAt(i);
                if(c!='.'){
                    allDot = false;
                    if(i>=index.size() || !index.get(i).containsKey(c)){
                        return false;
                    }
                    comp.add(index.get(i).get(c));
                }
            }
            if(allDot){
                return s.contains(word.length());
            }
            if(comp.size()==0 || (comp.size()==1) && comp.get(0).size()==0) return false;
            
            //Find the intersecion
            boolean found = false;
            int[] p = new int[comp.size()];
            int mx = comp.get(0).get(0);
            int compL = comp.size();
            while(!found){
                found = true;
                for(int i = 0; i<compL; i++){
                    ArrayList<Integer> cur = comp.get(i);
                    while(p[i]<cur.size()){
                        int v = cur.get(p[i]);
                        if(v<mx) p[i]++;
                        else if(v>mx){
                            found = false;
                            mx = v;
                        }else{
                            break;
                        }
                    }
                    
                    if(p[i]==cur.size()) return false;
                }
                
                if(found && l.get(mx).length()!=word.length()){
                    found = false;
                    mx = mx+1;
                }
            }
            return true;
        }
    }
    

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.