Skip to content

Does anyone know of a complete and up-to-date list of Recovery Act accounts?

Does anyone know of a complete and up-to-date list of Recovery Act TAFS — basically a list of all the basic accounts of money flowing from the Recovery Act?  There was one published by ProPublica on April 1, 2009 (from the post Falling Short of Expectations So Far – ProPublica ) and one buried in spreadsheets coming from the feds (e.g., in the worksheet entitled "92_AARP_TAFS_DD_Detail") .  I've been working on synthesizing the two lists and  updating them with the latest appropriation numbers that we can glean from scrapes of

I'm close to arriving at a list that I'm happy with.  However, this is the type of list that the feds must have, but one I've not been able to find.  Anyone know of one?

My project idea for the Freebase Hack Day

[Post in progress]

In this post, I will write about my project proposal for the upcoming Freebase HackDay.

The project is to elaborate the prototype at An org chart of the US Federal Government Based on OMB agency and bureau codes.

See what I've written at

I'm writing up a longer post right now, but let me list a few things I'd love help with:

1) to do the reconciliation of governement agenices to Freebase, I built a primitive acre app to help me apply Freebase suggest on a lot of items: — see source: and a background writeup of the idea: Refining this app would be very useful!

2) as part of the reconciliation process, coming up with a good way to figure out from the suggest API whether a given suggestion is given with high confidence or not would be helpful.  Tom Morris has some ideas in

3) writing the data back from the reconciliation would be very useful.  The data behind is — how to model the OMB codes and apply them to the government agenices in Freebase?  How about the entitites I couldn't find Freebase — should we create new entities for them?

4) Re what Spencer wrote:  yes, I'd love to see someone come up with a better visualization than what I have at — especially if there is a generic viewer.

A first pass at an org chart for the US Federal government

When I started trying to understand how the US Government works, I've been trying to find a chart that would list all the different department, agencies, and other organizational entities that comprise the government — and show how they are related to each other. I can't believe that I'd be the only person to find such an org chart useful; indeed, this idea is echoed in a project idea listed on the Sunlight Labs wiki as OPML the Federal Government:

Project Idea: This is a quick win– just create an OPML file of the existing structure of the Federal Government agencies in all branches.

As a step to creating such a representation, I've scraped the data in

Appendix C of OMB Circular No. A-11 (Sept 2008).

Under the MAX system, OMB assigns agency and bureau codes that are used to identify and access data in the budget database. The following table lists these codes in budget order. It also provides the corresponding agency codes assigned by Treasury. In certain instances, a different Treasury agency code may be used for some accounts in an agency; a complete listing can be found in the Budget Accounts Title (BAT) file.

I've uploaded this PDF to scribd to make it easier for readers to see the data the pdf has:

OMB Circular a 11 Appendix c

I read this PDF into Adobe Acrobat 8, saved it as "XML 1.0", messaged the XML a bit by hand to make it easier to apply some XQuery to create a starter OPML 1.0 file, and then did some more manual editing to represent the data in the correct hierarchy to produce:

My working assumption is that OMB Agency/Bureau codes + Treasury Agency Codes provide the key to unlocking a significant part of the higher levels of the US Federal Government. More on this assumption later.

You can see this OPML rendered by as such:

Some Possible Next Steps:

Tagged , , , ,

copyright status of White House photos on Flickr?

On the "noise list" at the School of Information at Berkeley, we recently got into a discussion about the copyright status of The Official White House Photostream's Photostream on Flickr.   Some of us would agree with the argument presented on the blog of the Creative Commons (Why Did the White House Choose Attribution and not Public Domain?) that

The photos are likely in the public domain because they are works created by the federal government and not entitled to copyright protection. As you might recall, the’s copyright notice indicates as much.

At present, Flickr doesn't allow an ordinary user to state that one of his pictures is in the public domain.    I've been waiting for CC0 to be added to list of CC licenses you can use.  Of course, the White House isn't necessarily a regular joe user,  and there is already a structure in Flickr already to handle public domain-ish photos:  the Flickr Commons with its "no known copyright restrictions" provision .    Perhaps putting White House photos in Flickr Commons won't quite work either.  Would one be able to put images produced by the US government  into the Flickr Commons in general?

BTW, one of the comments on the Creative Commons post pointed to in which we find the following stipulations:

These official White House photographs are being made available for publication by news organizations and/or for personal use printing by the subject(s) of the photographs. The photographs may not be used in materials, advertisements, products, or promotions that in any way suggest approval or endorsement of the President, the First Family, or the White House.

Given that these photos are arguably in the public domain (as argued in the blog post), are these stipulations legally enforceable?

Tagged , , , ,

Previous recommendations would say "open the data" to

As many have jumped into making recommendations on how Recovery data  should be packaged and disseminated, I'm reminded of some important previous work in this area.

The first is the ACM U.S. Public Policy Committee (USACM) Recommendations on Open Government. I have a tremendous respect for the ACM as "the world’s largest educational and scientific computing society". The ACM U.S. Public Policy Committee (USACM) "serves as the focal point for ACM's interaction with U.S. government organizations, the computing community, and the U.S. public in all matters of U.S. public policy related to information technology."   The policy statement on "open government"  first sets the context for its recommendations:

Individual citizens, companies and organizations have begun to use computers to analyze government data, often creating and sharing tools that allow others to perform their own analyses. This process can be enhanced by government policies that promote data reusability, which often can be achieved through modest technical measures. But today, various parts of governments at all levels have differing and sometimes detrimental policies toward promoting a vibrant landscape of third-party web sites and tools that can enhance the usefulness of government data.

The recommendations  "for data that is already considered public information" are:

  • Data published by the government should be in formats and approaches that promote analysis and reuse of that data.
  • Data republished by the government that has been received or stored in a machine-readable format (such as online regulatory filings) should preserve the machine-readability of that data.
  • Information should be posted so as to also be accessible to citizens with limitations and disabilities.
  • Citizens should be able to download complete datasets of regulatory, legislative or other information, or appropriately chosen subsets of that information, when it is published by government.
  • Citizens should be able to directly access government-published datasets using standard methods such as queries via an API (Application Programming Interface).
  • Government bodies publishing data online should always seek to publish using data formats that do not include executable content.
  • Published content should be digitally signed or include attestation of publication/creation date, authenticity, and integrity.

The second is a set of Open Government Data Principles formulated in October 2007  by the Open Government Working Group,  "30 open government advocates gathered to develop a set of principles of open government data":

Government data shall be considered open if they are made public in a way that complies with the principles below:

1. Complete
All public data are made available. Public data are data that are not subject to valid privacy, security or privilege limitations.
2. Primary
Data are collected at the source, with the finest possible level of granularity, not in aggregate or modified forms.
3. Timely
Data are made available as quickly as necessary to preserve the value of the data.
4. Accessible
Data are available to the widest range of users for the widest range of purposes.
5. Machine processable
Data are reasonably structured to allow automated processing.
6. Non-discriminatory
Data are available to anyone, with no requirement of registration.
7. Non-proprietary
Data are available in a format over which no entity has exclusive control.
8. License-free
Data are not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.

Compliance must be reviewable.

The final is the paper “Government Data and the Invisible Hand.” (Yale Journal of Law & Technology 11: 160.) by David Robinson, Harlan Yu, and Edward Felten.  The abstract contains the following recommendation:

Today, government bodies consider their own websites to be a higher priority than technical infrastructures that open up their data for others to use….It would be preferable for government to understand providing reusable data, rather than providing websites, as the core of its online publishing responsibility.

In  ProgrammableWeb last year, I distilled the paper's argument as follows:

The conclusion is based on a claim that the executive branch is comparatively ineffective at creating tools for presenting data and should therefore leave that work to a private sector (either nonprofit or commercial entities) that is best able to respond to a wide variety of possible uses for government data. That doesn’t mean that the government should provide no user interface to the data, but rather “should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data.” Fancier interfaces and tools should be built by others.

Moreover, the authors have recommended a specific mechanism for ensuring that the government does not privilege any user interface over their public data infrastructure: “require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large.”

Let me now make sure that these recommendations are at least referenced somewhere at the "National Dialogue" around the Recovery.

Amazon Web services in education program

Next time I teach my Mixing and Remixing Information course, I'll probably apply for a grant from the AWS in Education program:

AWS in Education provides a set of programs that enable the worldwide academic community to easily leverage the benefits of Amazon Web Services for teaching and research. With AWS in Education, educators, academic researchers, and students can apply to obtain free usage credits to tap into the on-demand infrastructure of Amazon Web Services to teach advanced courses, tackle research endeavors and explore new projects – tasks that previously would have required expensive up-front and ongoing investments in infrastructure.

I'll be teaching a seminar on mashups at the Educause 2009 Annual Conference

I'm excited to be teaching a pre-conference seminar at the Educause 2009 Annual Conference. My proposal for running a half-day seminar Creating and Enabling Web Mashups was accepted.  The seminar will take place at Tuesday, November 3, 2009 at 8:30AM.  I'm looking forward to spending some time in Denver.

Here's a short abstract for the session:

There are thousands of web mashups that recombine everything from Google Maps and Flickr with useful data drawn from multiple website.  Mashups are educational, fun, and even transformative.  In this tutorial, you will begin to build mashups that address problems of interest to you.   You will learn how to combine APIs and data into mashups.   You will also learn how to let others recombine content from your website.

Here's a longer abstract:

The Web contains thousands of mashups that recombine everything from Google Maps, Flickr,, the New York Times  with useful information about travel, finance, real estate, and more. By fusing elements from multiple web sites, mashups are often informative, fun, and even transformative — representing the way the Web as a whole is heading.

In this hands-on tutorial, you will learn how to build basic mashups and how to develop mashups to address problems of interest to you.   You will learn how to exploit such web elements as URLs, tags, and RSS feeds in your mashups; and how to combine APIs and data into mashups.   You will also learn how to enable users to recombine content from your website.  Although the most sophisticated mashups demand a wide range of technical knowledge, anyone with a solid knowledge of HTML will be able to learn practical skills from this tutorial.

Tagged , , , , ,

Congressional Oversight Panel, TARP, and Elizabeth Warren

I wish I had time to follow the TARP carefully — following the Stimulus already keeps busy enough. However, I learned a lot from Jon Stewart's April 15 interview with Elizabeth Warren the head of the Congressional Oversight Panel: Part 1 and Part 2.

Tagged , , ,

Participating in the national online dialogue around

Yesterday, I wrote a story on ProgrammableWeb (An Online Dialogue to Shape to educate readers on (the government website aimed to let American track the spending of money arising from the  American Recovery and Reinvestment Act of 2009 — the "Stimulus Package")   and to draw attention to a “national dialogue” this week (until May 3) to solicit ideas aimed at answering the key question:

What ideas, tools, and approaches can make a place where all citizens can transparently monitor the expenditure and use of recovery funds?

I've been reading some of the ideas presented so far and voted on a couple.  I added comments to two so far.   In response to the proposal XML Web Services ("Make recovery data available as a web service via SOAP XML."), I wrote:

I agree that some type of rigorous programmatic interface that allows developers to access the data from is essential. I think that SOAP and associated the rest of WS-* stack might be one way to implement such access mechanisms, but I would not want SOAP to the exclusive protocol used. I would argue, for instance, that a RESTful approach is also an excellent alternative to consider for

On a front closer to what our work has been about, in response to Making stimulus spending data accessible to the public, I wrote

I'm one of the Berkeley researchers mentioned above involved with making recommendations on how data feeds should be use to make the recovery more transparent (see and

Although some (but not all) agencies receiving and dispersing recovery funds are using feeds in their reporting (see a list that we compiled at, the best data on dollars appropriated, obligated, or spent is in the Excel spreadsheets. Although there are apparently templates for the reports, they keep changing format and there's nothing to stop agencies from inserting extra fields or omitting other fields. We know this for a fact since we've written programs to scrape the data from the spreadsheets and find it a challenge to keep up with changes that keep breaking our scripts.

The federal government should made the data in the form of XML feeds in the first place (backed by a schema so that we can check that the data is valid), instead of making people who want to use that data scrape it out of Excel in a highly fragile process.

As I wrote yesterday, it will be interesting to see how well the site actually does at aggregating a large number of proposals and surfacing the best ones. Moreover,

Tagged , ,

Tracking the stimulus/recovery in the news

Over the last couple of months, I've been studying the Stimulus through the lens of the weekly reports published on   My colleagues Erik Wilde and Eric Kansa (at the School of Information at UC Berkeley) and I  made recommendations on how data feeds should be used to foster transparency around stimulus data,  in addition to developing prototypes of the types of visualizations one could do with such data feeds.   We're continuing work on that front, specifically scraping data currently found in Excel and transforming that data into XML (Atom) feeds.

It is much easier to transform the financial data into visualizations and analyses, once it is in the form of feeds (rather than Excel).   The federal government should made the data in the form of  XML in the first place (backed by a schema so that we can check that the data is valid),  instead of making people who want to use that data scrape the data out of Excel in a highly fragile process.

To discern the meaning of the data we are extracting from various government sites,  I am now trying to keep up with the news around the recovery.  Here are some of the sources I've been tracking so far:

This list represents my current starting points.  I naturally expect to find a lot of other useful sources as I go along.