Short URL’s

Microblogging makes this important. There are a bunch, but I prefer http://is.gd because its five characters long.

Services

Other’s I’ve found and used include

A number of people publish top 10 lists, you can use google to find them.

Problem definition

I am interested in writing or implementing a url-shortener, for use inside the firewall, or implementing through an API. It seems quite simple until you consider performance at scale. My first thought was to use a database with an identifier/sequence data type as the key to the hashing algorithm, since its best to keep a record of the URL’s issued so if people ask twice for the same url it can be reused.

The second part is resolving the URL on request. Is this just an apache solution with a very large redirect file?

Some research

I used google: url shortner howto, to see what it came up with.

Two interesting finds included

Ideas

Idea No 1 was to use a database identifier type and hash it, I was going to investigate if we could use a web server redirection file, or if I’d have to write something using a db retrieval. However both these the database transactions introduce serial bottlenecks and inhibit scalability.

So can pre-allocate the sh.urls in blocks , and allocate them using hadoop? Is hadoop a sensible solution for “find me next unused”, or perhaps it doesn’t matter; its just find me one unused slot?