Today, my colleagues Erik Wilde, Eric Kansa, and I are pleased to announce our new report "Web Services for Recovery.gov" and its companion website recovery.berkeley.edu. Last week, the redesign of Recovery.gov was made public to much fanfare. Recovery.gov is the U.S. government’s official website for publicly documenting how funds from the American Recovery and Reinvestment Act of 2009 (ARRA) have been allocated and spent. Our work focuses on a crucial aspect of Recovery.gov that has yet to receive sufficient attention, namely, how data Recovery Act spending will be made available in machine-readable form for analysis, interpretation, and visualization by third-party applications. In our report and in our website, we propose a reporting architecture, created some sample feeds based on that architecture, and demonstrate how that data could be used in a simple map-based mashup.
Here are some highlights from our report, which I quote (with a bit of editing):
- Design priorities for recovery.gov need to shift from focusing on deploying an attractive Web site toward designing ARRA web services to support reuse of data in third-party applications.
- These services should allow any party to receive the complete set of ARRA reporting data in a timely and easily usable manner, so that in principle, the full functionality of Recovery.gov could be replicated by a third party.
- Our proposed architecture is based on the principles of Representational State Transfer (REST) and always attempting to use the simplest and most widely known and supported technology for any given task.
- We recommend the feed-based dissemination of ARRA reporting data using the most widely used technologies on the Internet today: HTTP for service access, Atom for the service interface, and XML for the data provided by the service. This approach allowing access from sophisticated server-based applications or from resource-constrained devices such as mobile phones.
- The manner which data flows from FederalReporting.gov to Recovery.gov is of critical importance. Ideally, Recovery.gov should use Web services offered by FederalReporting.gov.
- We strongly recommend that Recovery reporting systems adopt the Atom syndication format for feeds. Feeds represent a major positive development in making government data more open to citizen review and reuse and provide a unique ability to do so by merging utility for humans as well as machines.
- While not formally standardized, feed autodiscovery is well supported by current browsers and could be implemented reliably with a well-defined set of implementation guidelines for Web pages offered by Recovery.gov.
- We strongly recommend making feed paging and archiving mandatory, so that the feeds are not just a temporary way of communicating that information has become available. Instead, the feed pages should be available as persistent and permanent access points, so that accessing information via feeds can be done robustly and reliably.
- ARRA data dissemination services should be more resource-oriented than service-oriented. XML representations should contain links (in the form of URIs) to related data resources, thereby representing the relationships between the different concepts which are relevant for reporting.
- The Recovery reporting schema uses many different coding systems and identifiers. Publication of resources related to some of these identifiers will be of great value. (We list key identifiers in the report.)
- There are many possible analyses that people may wish to perform on Recovery data, making it difficult to accommodate them all. Therefore, querying services should be oriented toward making machine-readable representations of data available, so that third party developers can easily populate their own analysis engines and run their own specialized algorithms on that data.