Two solutions and thoughts


  • 71

    My first solution produces short URLs like http://tinyurl.com/0, http://tinyurl.com/1, etc, in that order.

    class Codec:
    
        def __init__(self):
            self.urls = []
    
        def encode(self, longUrl):
            self.urls.append(longUrl)
            return 'http://tinyurl.com/' + str(len(self.urls) - 1)
    
        def decode(self, shortUrl):
            return self.urls[int(shortUrl.split('/')[-1])]
    

    Using increasing numbers as codes like that is simple but has some disadvantages, which the below solution fixes:

    • If I'm asked to encode the same long URL several times, it will get several entries. That wastes codes and memory.
    • People can find out how many URLs have already been encoded. Not sure I want them to know.
    • People might try to get special numbers by spamming me with repeated requests shortly before their desired number comes up.
    • Only using digits means the codes can grow unnecessarily large. Only offers a million codes with length 6 (or smaller). Using six digits or lower or upper case letters would offer (10+26*2)6 = 56,800,235,584 codes with length 6.

    The following solution doesn't have these problems. It produces short URLs like http://tinyurl.com/KtLa2U, using a random code of six digits or letters. If a long URL is already known, the existing short URL is used and no new entry is generated.

    class Codec:
    
        alphabet = string.ascii_letters + '0123456789'
    
        def __init__(self):
            self.url2code = {}
            self.code2url = {}
    
        def encode(self, longUrl):
            while longUrl not in self.url2code:
                code = ''.join(random.choice(Codec.alphabet) for _ in range(6))
                if code not in self.code2url:
                    self.code2url[code] = longUrl
                    self.url2code[longUrl] = code
            return 'http://tinyurl.com/' + self.url2code[longUrl]
    
        def decode(self, shortUrl):
            return self.code2url[shortUrl[-6:]]
    

    It's possible that a randomly generated code has already been generated before. In that case, another random code is generated instead. Repeat until we have a code that's not already in use. How long can this take? Well, even if we get up to using half of the code space, which is a whopping 626/2 = 28,400,117,792 entries, then each code has a 50% chance of not having appeared yet. So the expected/average number of attempts is 2, and for example only one in a billion URLs takes more than 30 attempts. And if we ever get to an even larger number of entries and this does become a problem, then we can just use length 7. We'd need to anyway, as we'd be running out of available codes.


  • 1
    F

    Nice Solution!
    But I don't really think that random functions are necessary.


  • 32
    S

    Same solution in Java. Thanks to @StefanPochmann and @louistang0909 for pointing out couple of issues:

    public class Codec {
        Map<String, String> index = new HashMap<String, String>();
        Map<String, String> revIndex = new HashMap<String, String>();
        static String BASE_HOST = "http://tinyurl.com/";
        
        // Encodes a URL to a shortened URL.
        public String encode(String longUrl) {
            if (revIndex.containsKey(longUrl)) return BASE_HOST + revIndex.get(longUrl);
            String charSet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
            String key = null;
            do {
                StringBuilder sb = new StringBuilder();
                for (int i = 0; i < 6; i++) {
                    int r = (int) (Math.random() * charSet.length());
                    sb.append(charSet.charAt(r));
                }
                key = sb.toString();
            } while (index.containsKey(key));
            index.put(key, longUrl);
            revIndex.put(longUrl, key);
            return BASE_HOST + key;
        }
    
        // Decodes a shortened URL to its original URL.
        public String decode(String shortUrl) {
            return index.get(shortUrl.replace(BASE_HOST, ""));
        }
    }
    

  • 1

    (fixed now) @selim Not quite the same. When I ask your code to repeatedly encode the same longUrl, it creates separate entries. And in case of collisions, you build much longer URLs like http://tinyurl.com/gQsUMeuYytGl.


  • 1
    S

    @StefanPochmann You're right. I forgot to check if the map already contains the encoded string. Also fixed the issue in case of collisions. Thanks for pointing it out!


  • 0
    L

    @selim I think you need another hashmap to store the longurls as key and its associated key as value

    for now, the if (map.containsKey(longurl))... sentence doesn't make sense


  • 0
    S

    @louistang0909 You're absolutely right. There has to be a reverse index.


  • 0
    L

    just curious any reason to choose key length as 6 ? why not < or > 6 ?


  • -1
    M

    I guess we can simply use hash function in python to give us short url and then use it as key store the long URLs. This way we don't have to store two dicts.
    Here is my simple solution -

    class Codec:

    urlDict = {}
    
    def encode(self, longUrl):
        hsh = hash(longUrl)
        if hsh not in self.urlDict:
            self.urlDict[hsh] = longUrl
        return "http://tinyurl.com/"+hsh
    
    def decode(self, shortUrl):
        return self.urlDict.get(shortUrl[20:],"")

  • 0
    S

    @lugok It's a magic number which is a code smell. However, check out the companion system design problem [Design TinyURL]. The first requirement is:

    "The identifier (the highlighted part) can be any string with 6 alphanumeric characters containing 0-9, a-z, A-Z."


  • 0
    L

    @selim missed it .. thanks.


  • 3

    Using a combination of System.nanoTime() and hashCode().
    It produces urls in this format https://tinyurl.com/596293762700519
    EDIT:(I think instead of nanoTime(), a sequence number would just be enough to handle same hash collisions).

    // Encodes a URL to a shortened URL.
    HashMap<String,String> tinyUrlMap = new HashMap<>();
    HashMap<String,String> longUrlMap = new HashMap<>();
    String tinyPrefix = "https://tinyurl.com/";
    
    public String encode(String longUrl) {
        if(longUrlMap.containsKey(longUrl)) return longUrlMap.get(longUrl);
        String tinyUrl = tinyPrefix+(longUrl.hashCode()+System.nanoTime());
        tinyUrlMap.put(tinyUrl,longUrl);
        longUrlMap.put(longUrl,tinyUrl);
        return tinyUrl;
    }
    
    public String decode(String shortUrl) {
        return tinyUrlMap.get(shortUrl);
    }

  • 0

    For your 2nd solution, if you use Bloom Filter, it should quickly find un-used shortUrl and use that.


  • 0

    @pinkfloyda Would it be quicker than what I have? How much?


  • 0
    L

    @pinkfloyda Would u share the code plz, I'd like to learn.


  • 1

    @LucianoKlein No i don't have the code, I just say in general sense, Bloom Filter is efficient to decide someone is NOT in the set but with the trade-off of some false positives (some one is in fact NOT in the set but Bloom Filter thinks it is). A good Bloom Filter will require good choices of hash functions, which I did not spend more time research.


  • 0

    @pinkfloyda I think bloom filter would be a good choice here if the map keeps growing really huge and start having a lot of collisions. The advantage of bloom filter is that it's constant space, but the disadvantage in this case is that some of the urls are skipped even though they're actually not created before.

    Browsers can use bloom filters to check spam urls since it takes less space. If url "not in" bloom filter, proceed to visit, else cross check your server again to verify if it's really a spam.


  • 0
    G

    Why not use lenth 6 permutation of printable strings as the key, there will be no collision any more

    class Codec:
        
        def __init__(self):
            self.pool = itertools.permutations(string.printable, 6)
            self.cache = {}
        
        def encode(self, longUrl):
            key = next(self.pool)
            self.cache[key] = longUrl
            return key
            
        def decode(self, shortUrl):
            key = shortUrl
            return self.cache[key]
    

  • 0

    @greedisgood-1000000 said in Two solutions and thoughts:

    Why not

    Because it has three out of the four disadvantages I mention in my first post.

    And I don't believe that for example the newline character is allowed in URLs.

    And the "short URLs" you produce aren't URLs. Not even strings. They're tuples.

    Also, why not use product instead of permutations?


  • 0
    D

    random is not good, we can use some hash function to get a checksum of the url, if we get a conflict, we can generate another checksum
    but when we decode the url, just search the key in the database, it's not really "decode".


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.