Tuesday, February 13, 2007

Open your Data!

[Warning: Long long rant]

I'm scratching a rather irritating itch. I'd like to travel around in Bangalore, but I don't have the money to move around in rickshaws, nor can read Kannada to figure out where a particular bus goes (No dual language boards here!). After having been pampered by BEST, the BMTC is getting to be a royal pain.

I've bought maps, guide books, route maps, time tables, but they've all been "designed" without even consideration for the most common use-case. Just know the stop names? You're out of luck. No source lists stop names, only stage names (which is what the BEST also does, BTW).

Well, not exactly no source, for there is one source that has stop-level granularity. The superbly detailed large-format BMTC route map (50 bucks!) is that only source, but nobody intended it to be used that way. In fact, it's a mystery to me how people are supposed to use this thing. Only a laborious search through the entire map will reveal the locations of the stops, and once that is done, correlating that information to route numbers is next to impossible. In fact, the map wastes precious space on "segment numbers" (which I guess actually constitute routes), but that information in its present format is only intelligible to somebody on the inside--it's not used by any other information on the map.

Well, this is Bangalore, the IT capital of India, and so the BMTC provides a web-based interface to search. Except that it sucks big time. There are two versions, (and I'm guessing here) a "professional" JSP edition, and a "final-year project" PHP version. Forget the technical mistakes---using POST for idempotent data---the UI design is horrid. It's what most UI designers would say is the perfect example of the programmer-designed interface. Letting the UI reflect your code, and not the opposite. Both the "search engines" present drop-downs, presenting stage-to-stage information. Wow. I imagine that everybody travels from stage to stage only.

So what do I do? I do what I do best. I'm now writing a search engine for finding information about buses. Its called busfinder, and will initially provide "similar" functionality to the existing "search engines". Except, of course, it won't be tied down to the BMTC. I intend it to be used by any bus service. It will be small, fast, and featureful. It'll be written with the traveller in mind, and not the backend database structure. I've got a little experience doing this sorta thing, and hoping this will also clear up some of my programming blues.

Which brings me to the real topic of this post. Which is data. Everyone who's spent some time in this industry knows that programs are a dime a dozen. It's your data that's actually valuable. That's why vendors prefer to use proprietary data formats--once locked in, they know you'll be at their mercy forever. And that's why there exists even a data conversion sector.

But for my search engine, I need data. The BMTC could do me a wonderful service by giving me access to their data in a machine-readable form (maybe they will, if I ask, but I haven't and won't). But for now, I had to scrap their HTML pages. And as I have found, their data sucks. There are missing routes, incorrect stop names, and inconsistent information that all stink to heaven of five-buck-an-hour typists and even cheaper database architects (normalization, anyone?).

And then I had this idea inspired by their application to make their route maps available via Google Maps (which now has street-level data for Bangalore, but not yet searchable, aargh!), but there seem to be no public-domain (or even freely-licensed) geo-referenced information available for Bangalore. The info exists--see http://traffic.mapunity.org (functionally similar to busfinder, but with the same UI problems)and http://www.janaagraha.org/jmap/, but no data is available for download. It's all locked up behind the respective applications. No APIs either, so no mash-ups possible.

Which is why I make this appeal. Free your data! Make your data available in machine-readable format with liberal licenses for use by the general public. If you're a government institution or somebody who doesn't necessarily have computer expertise---tap the internet! Release your data and see how the brilliant minds of the 'Net breath life into your data. Remember, there is always a better program. But you may have the best data.

7 comments:

Abhijit Pai said...

Wikimapia (http://www.wikimapia.org/), maybe.

ksp said...

Wikimapia doesn't have street level detail unfortunately. And there's a bigger problem (from the WIkimapia article on Wikipedia):

" ... the site does not describe the licensing terms of user-contributed information and does not provide a way for third parties to download the contributed information."


Another problem is that there is a huge lack of GPS devices to do this sort of thing in India.

Thankfully, the geographic community in India is asking for the Survey of India to start releasing maps.

rohan_nog said...

As Calvin's dad would say, "It'll build character."

rohan_nog said...

I just saw those 2 sites. Man, the PHP one is Bad! What on earth do those alphabets mean? I also saw the other site, and isn't there a stop for IISc?

ksp said...

The alphabets are the first letters of stages. To prevent "overload". I guess these guys never use their own program.

As for the Institute not there, well, it's there as the "Indian Institute Layout" or that's what I'm guessing, since there are so many Indian Institutes here.

Akshay Surve said...

Sree,

I am sure the freemap initiative is something to look forward to.

http://freemap.in/

There is a Mumbai freemap community. From what I know the bangalore freemap community may also come up soon (or its already up there somewhere).

- Akshay

Do check the wiki; it has some interesting things up there.

ksp said...

akky: where's the data brother?

I'd like to download co-ordinate data, and then add that info to my database. But it doesn't seem to be there.

Also their wiki seems to have been spammed.

I did know about freemap.in, some of my friends are working on it, see Indictrans there?