Yesterday, I wrote a story on ProgrammableWeb (An Online Dialogue to Shape Recovery.gov) to educate readers on recovery.gov (the government website aimed to let American track the spending of money arising from the American Recovery and Reinvestment Act of 2009 — the "Stimulus Package") and to draw attention to a “national dialogue” this week (until May 3) to solicit ideas aimed at answering the key question:
What ideas, tools, and approaches can make Recovery.gov a place where all citizens can transparently monitor the expenditure and use of recovery funds?
I've been reading some of the ideas presented so far and voted on a couple. I added comments to two so far. In response to the proposal XML Web Services ("Make recovery data available as a web service via SOAP XML."), I wrote:
I agree that some type of rigorous programmatic interface that allows developers to access the data from recovery.gov is essential. I think that SOAP and associated the rest of WS-* stack might be one way to implement such access mechanisms, but I would not want SOAP to the exclusive protocol used. I would argue, for instance, that a RESTful approach is also an excellent alternative to consider for recovery.gov.
On a front closer to what our work has been about, in response to Making stimulus spending data accessible to the public, I wrote
I'm one of the Berkeley researchers mentioned above involved with making recommendations on how data feeds should be use to make the recovery more transparent (see http://www.ischool.berkeley.edu/newsandevents/news/20090417recoveryguidelines and http://isd.ischool.berkeley.edu/stimulus/2009-029/)
Although some (but not all) agencies receiving and dispersing recovery funds are using feeds in their reporting (see a list that we compiled at http://isd.ischool.berkeley.edu/stimulus/feeds/feeds.html), the best data on dollars appropriated, obligated, or spent is in the Excel spreadsheets. Although there are apparently templates for the reports, they keep changing format and there's nothing to stop agencies from inserting extra fields or omitting other fields. We know this for a fact since we've written programs to scrape the data from the spreadsheets and find it a challenge to keep up with changes that keep breaking our scripts.
The federal government should made the data in the form of XML feeds in the first place (backed by a schema so that we can check that the data is valid), instead of making people who want to use that data scrape it out of Excel in a highly fragile process.
As I wrote yesterday, it will be interesting to see how well the recovery.gov site actually does at aggregating a large number of proposals and surfacing the best ones. Moreover,