By hagrin - Posted on 29 November 2006

Since my SEO Tool has been successfully collecting data and I haven't decided how to display the information, I decided to start collecting data for another project. I remember reading that there was a discussion over the public domain of sports statistics and I personally believe that the data should be free and available to download. However, since it is not and data distribution sites want big money for such information, I did what any good programmer would do - develop a web scraper to parse available data, restucture it and hopefully release it freely to the public (this last statement is probably a no go, but I'll do the necessary research to figure out the potential copyright issues).

So, one would probably ask, once you have the data what do you plan on doing with it? To be honest, the only real use I see outside of eay-to-use public distribution would be to provide the public a "system" for handicapping future contests. Many sites like that post trends post statistics on an average basis which really misleads someone looking at the matchups. I believe that stronger trends exist when evaluating other factors not necessarily measured or taken into account as well as looking at standard deviations of data and potentially the median of such data.

I actually set myself a deadline for this one - January 1, 2007 so I would check back around then for the SQL scripts needed and hopefully the web interface that allows people to evaluate certain matchups (although as I write that, I'm thinking that it might just be better to automatically evaluate all the games for a day and then rank them - no need for user intervention). Obviously, I could track the accuracy of my predictions and tweak the formula as I see certain trends to improve the accuracy.