Skip to content

ARRA Treasury Account Symbols: the outcome of our FOIA request

In July, I wrote about why I've been looking for Recovery TAFS and appropriations. In an attempt to get an official list from the US federal government, Eric Kansa and I sent a FOIA letter to OMB to request the release (in electronic form) of a complete and up-to-date list of all Recovery Act (ARRA) TAFS (Treasury Appropriation Fund Symbols). We had known of two out-of-date and potentially incomplete lists of the ARRA TAFS:

  1. the worksheet entitled "92_AARP_TAFS_DD_Detail" in May 8, 2009 weekly report from USAID
  2. a pdf published by ProPublica on April 1, 2009.

We specifically asked for an up-to-date Excel spreadsheet with the same columns as the worksheet "92_AARP_TAFS_DD_Detail" — but with an explanation of what each of the columns meant.  We  also encouraged the OMB to make this data available on an ongoing basis as an XML document published on the OMB website and kept up to date, with an explanation of each field.

Last week, we got what we asked for:  an Excel spreadsheet ( see Internet Archive metadata), which I've also uploaded as a Google spreadsheet.  Note the description of the spreadsheet to be found in the first sheet:

In a letter dated August 24 to OMB's Freedom of Information Officer, you requested that OMB provide you with an up-to-date Excel spreadsheet with the same columns as a worksheet you emailed on October 16. The Berk_FOIA_Data tab in this Excel file provides up-to-date information using the same columns in the file you sent. The information is up-to-date as of October 19, 2009, and shows a list of each Treasury Appropriation Fund Symbol (TAFS) associated with the Recovery Act (RA). Below is a description of each column in the Berk_FOIA_Data tab.

I've not had an opportunity to complete my analysis of the FOIA spreadsheet  and to correlate the data to the recipient reporting.   You'll note that there are 342 TAFS in the spreadsheet.  To derive a list of Treasury Account Symbols (TAS as opposed to TAFS), we concatenate the  Treasury Agency Code with the Treasury Bureau Code (separated by a '-') and bundle all  the corresponding TAFS.   See the resulting list, with a total of 313 TAS.  You'll note that a spreadsheet that lists the TAS as of Sept 13, 2009 has 309 symbols, while the HTML list on currently lists 327 TAS (along with 32 place-holder symbols).   The differences in those lists is something to nail down next.  At any rate, even something like the list of Treasury Accounts associated with the Recovery Act is more fluid than what I would have expected at this point.

One thing that has puzzled me is why there are so many TAFS with $0.00 for the treasury warrant.  You find an explanation in the FOIA spreadsheet:

Treasury Warrant is the sum that Treasury warranted to the TAFS. You can think of a warrant as being the initial deposit in a new checking account. For many of the TAFSs on the list, you can track the amounts appropriated in the law to the amount of the Treasury warrant. In some cases, however, you cannot track back to actual amounts because the funding in the law is formula based. In many cases, a TAFS has a zero in the Treasury Warrant column. The primary reason for this is that these TAFSs receive RA funds via a transfer from other TAFSs.

Hmmm.  We're going to have to understand the relevant formulas.

Acknowledgement:  A big thanks to Brian Carver for providing us valuable advice on how to formulate, draft and send a FOIA request and helping us to interpret what's happening during a FOIA process.

Tagged , , , , , , , , ,

Web Services for

Today, my colleagues Erik Wilde, Eric Kansa, and I are pleased to announce our new report "Web Services for" and its companion website   Last week, the redesign of was made public to much fanfare. is  the U.S. government’s official website for publicly documenting how funds from the American Recovery and Reinvestment Act of 2009 (ARRA) have been allocated and spent.   Our work  focuses on a crucial aspect of that has yet to receive sufficient attention, namely, how data Recovery Act spending will be made available in machine-readable form for analysis, interpretation, and visualization  by third-party applications. In our report and in our website, we propose a reporting architecture,  created some sample feeds based on that architecture, and demonstrate how that data could be used in a simple map-based mashup.

Here are some highlights from our report, which I quote (with a bit of editing):

  • Design priorities for need to shift from focusing on deploying an attractive Web site toward designing ARRA web services to support reuse of data in third-party applications.
  • These services should allow any party  to receive the complete set of ARRA reporting data in a timely and easily usable manner, so that in principle, the full functionality of could be replicated by a third party.
  • Our proposed architecture is based on the principles of Representational State Transfer (REST) and always attempting to use the simplest and most widely known and supported technology for any given task.
  • We recommend the feed-based dissemination of ARRA reporting data using the most widely used technologies on the Internet today: HTTP for service access, Atom for the service interface, and XML for the data provided by the service. This approach allowing access from sophisticated server-based applications or from resource-constrained devices such as mobile phones.
  • The manner which data flows from to is of critical importance. Ideally, should use Web services offered by
  • We strongly recommend that Recovery reporting systems adopt the Atom syndication format for feeds.  Feeds represent a major positive development in making government data more open to citizen review and reuse and provide a unique ability to do so by merging utility for humans as well as machines.
  • While not formally standardized, feed autodiscovery is well supported by current browsers and could be implemented reliably with a well-defined set of implementation guidelines for Web pages offered by
  • We strongly recommend making feed paging and archiving mandatory, so that the feeds are not just a temporary way of communicating that information has become available. Instead, the feed pages should be available as persistent and permanent access points, so that accessing information via feeds can be done robustly and reliably.
  • ARRA data dissemination services should be more resource-oriented than service-oriented.  XML representations should contain links (in the form of URIs) to related data resources, thereby representing the relationships between the different concepts which are relevant for reporting.
  • The Recovery reporting schema uses many different coding systems and identifiers. Publication of resources related to some of these identifiers will be of great value.  (We list key identifiers in the report.)
  • There are many possible analyses that people may wish to perform on Recovery data,  making it difficult  to accommodate them all. Therefore, querying services should be oriented toward making machine-readable representations of data available, so that third party developers can easily populate their own analysis engines and run their own specialized algorithms on that data.

Erik Wilde has also commented on our report. We welcome and look forward to your feedback.

Finally, we are grateful to the Sunlight Foundation for a grant that helped to support this effort.


Advice for

Rusty Talbot posted the following request for feedback on the Sunlight Labs list this morning

The Recovery, Accountability, & Transparency Board wishes to have an open discussion with all interested developers about how data should be made available via

As you are all aware, a new version of will be released soon. From a data standpoint, the initial release of the new site will replicate existing functionality. However, the Board aims to set a new standard of transparency with this site and would therefore like to make the data available in the most convenient and straightforward way (or ways) possible so you can use and analyze official, up-to-date Recovery Act data. We need your input to achieve this goal.

Please let us know how the site could best meet your needs in terms of  machine-readable data format(s) and standards, APIs, guidance, training, etc. [emphasis mine]

As I waited for Rusty to respond to my question of how best to provide feedback, Luigi Montanez went ahead with posting a series of excellent pointers.  I second Luigi's advice, also commend  the recent OMB Watch Recovery Act Transparency Status Report)  and have similar general web development advice to offer, which I had written up as "Making Your Web Site Mashable" (pdf)  (Chapter 12 of my book Pro Web 2.0 Mashups).

In terms of work specifically related to the Recovery Act. my Berkeley colleagues Erik Wilde, Eric Kansa, and I published a report "Proposed Guideline Clarifications for American Recovery and Reinvestment Act of 2009" in which we proposed and prototyped  the use of Atom feeds to disseminate Recovery spending data.  We are currently at work on updated recommendations based on the latest Recovery Act OMB Guidance.

One of my most important things that has made the Recovery spending less-than-transparent is how difficult it has been to locate basic accounting data.  For example, after looking for months, I have yet to locate a reliable list of Recovery TAFS, basically a list of all the pots of money (as tallied by Treasury) and the maximum amount of money we expect to see in each pot (the dollars appropriated).  Now, does list the amounts obligated and spent by agency, but how much money has been appropriated?  That basic data should be clearly documented at, so that we can track the flow of money reliably from the originating legislation to Treasury out to the agencies  and then to contractors and grantees  or the states.  (I will note that ProPublica's Stimulus Tracker does break down the totals by agency but doesn't publish the list of individual accounts.)

At any rate, there is more to say — but I'll wait until Rusty responds to what is here.


calendar data from Educause put into a Google Calendar

I'm starting to prepare my notes for the pre-conference seminar Creating and Enabling Web Mashups that I'll be leading on November 3, 2009 at 2009 EDUCAUSE Annual Conference.   I'm looking for good examples to use in the seminar.  One that I'm contemplating is showing how to import the Educause 2009 calendar, which is available as an iCalendar file (linked from the main program page.)  If you import the icalendar file, you can produce a Google calendar: (You have to navigate to November 2009 to see any events.)

Tagged , ,

plotting data for counties on Google Maps: Part I

There is a huge amount of government and socio-economic data in general  gathered at the county level.  It would be nice to be able to plot that data on an desktop or online map (e.g., Google maps).  This morning I posted a question on the  Sunlight labs mailing list asking for some help:

I would like to display US counties on a Google map based on some  scalar value (e.g., population)  for each county and a color map that associates values to colors.  Does anyone know of a library that makes this easy to do?  (I'm interested in doing the same for other adminstrative regions, such as zip codes and congressional districts.)

( contains a good discussion of the topic — and I have found other references that might be helpful,  but I have not seen the functionality I'm looking for distilled down into an easy-to-use library.)

Building a ground overlay

When I tweeted my question, I got a very helpful response from Sean Gillies:

That's a lot of polygons (3489, see to draw in the browser. Make an image layer with OpenLayers?

Sean confirmed what I was thinking that I had to compute a static image to use as an overlay — otherwise drawing 3000+ polygons with slow down Google maps prohibitively.   In fact, in many ways, I've been trying to use the approach I've seen from the demo gallery of the Google Maps API v3:   John Coryat's  ProjectedOverlay example, which "uses OverlayView to render an image inside a given bounding box (LatLngBounds) on top of the map".  (You can look at the overlay image (.png) directly and reuse ProjectedOverlay.js)

So one approach would be to calculate a png of the counties (colored appropriately), and this png would provide an efficient way to display county data.  I had started down this road a while ago — Sean's post gave me some more direct guidance in how to create a useful Python-based desktop GIS setup to be able to handle such tasks as creating my desired map in a png form.  To be honest, I've found the whole open source GIS world fairly confusing.  I bought and read part of Gary Sherman's Desktop GIS: Mapping the Planet with Open Source Tools. (Illustrated edition. Pragmatic Bookshelf, 2008) and was considering installing FWTools, GRASS GIS, and Quantum GIS. His post alerted me to, and convinced me to try OSGeo4W , which is

a binary distribution of a broad set of open source geospatial software for Win32 environments (Windows XP, Vista, etc). OSGeo4W includes GDAL/OGR, GRASS, MapServer, OpenEV, uDig, QGIS as well as many other packages (about 70 as of summer 2008).

I installed OSGeo4W but have not been able to figure out the Python bindings (and hence can't yet try out the code that Sean posted).   Neither has the Python setup from FWTools 2.4.3 worked for me.  My next steps is to follow the instructions at Python Package Index : GDAL 1.6.1 to see whether I'll have better luck.

Joshua Tauberer's WMS service

Joshua Tauberer of responded to my query by referring me to his experimental WMS service, which produces WMS layer for entities ranging from Congressional and state districts to counties.   I modified one of the examples that  to try to plot the counties.   For some reason, not all the counties show up yet.  Still, this approach is very promising since it would save me the work of calculating the coordinates of the county boundaries to begin with.  I have to come back to study and apply the techniques documented at WMS Server API Documentation.

Other things to study further

Tagged , , ,

I'm looking forward to Transparency Camp 2009

I'll be at TransparencyCamp 2009 tomorrow (You can follow the conference tweets at #tcamp09, whether or not you'll be in physical attendance.) Since TCamp09 is an unconference, any formal agenda will be determined at the conference by sessions attendees propose there. I'd like to see and attend sessions on the following topics:

  • projects/techniques to track the fiances of the US Government. I've been working on tracking the Recovery Act (aka the Stimulus) and would like to compare notes with others involved with understand how budgets are created, and money allocated and spent at the federal level.
  • projects/techniques on how to generate an ontology or mapping of the structures of the federal and state governments (e.g., how would we map the US Government Manual into structured machine-readable form?)
  • I'd love to hear Joshua Tauberer tell us about and Carl Malamud about
  • business/sustainability models around government transparency projects. I'd like to devote more time to government transparency, but how do we pay the bills?

A clarification of why I'm looking for Recovery TAFS and appropriations

In response to a question I received on a mailing list in response to my query  Does anyone know of a complete and up-to-date list of Recovery Act accounts? concerning why I was looking for amounts appropriated and not just obligated an spent for the Recovery, I wrote the following clarification (which I have edited lightly):

In addition to the amount of money that is obligated and spent, isn't there also the amount money that is appropriated?  The amount obligated and spent goes up, but isn't the appropriation supposed to be maximum that the obligated and spent amounts ever reach?  (I'm an accounting newbie, so correct me if I misunderstand what these terms mean.)  What I'm trying to understand right now are statements like "ARRA is a $787 billion dollar bill" and the Department of Education is getting a "$100 billion".   Specifically, I'd like to see how various line items add up to the totals quoted.

The amounts obligated used to be reported in the weekly excel spreadsheets from the agencies.  For example, consider the April 3 report from the Department of Ed:

and the corresponding spreadsheet:

At, we're told that:

  • Total Available: $11,363,064,856
  • Total Paid Out: $0

The spreadsheet (specifically the "Weekly Update" worksheet) actually supports this statement — here, I copy the table and add the totals line.

Program Source/ Treasury Account Symbol: Agency Code Program Source/Treasury Account Symbol: Account Code Program Source/Treasury Account Symbol; Sub-Account Code (OPTIONAL) Program Description (Account Title) Total Appropriation Total Obligations Total Disbursements
91 0103 IMPACT AID, RECOVERY ACT $100,000,000 $0 $0
91 0196 HIGHER EDUCATION, RECOVERY ACT $100,000,000 $0 $0
91 0197 INSTITUTE OF ED SCIENCES, RECOVERY ACT $250,000,000 $0 $0
91 0198 STUDENT AID ADMIN, RECOVERY ACT $60,000,000 $0 $0
91 0199 STUDENT FINANCIAL ASST, RECOVERY ACT $16,483,000,000 $198,901,281 $0
91 0207 INNOVATION & IMPROVEMENT, RECOVERY ACT $200,000,000 $0 $0
91 0299 SPECIAL EDUCATION, RECOVERY ACT $12,200,000,000 $5,970,012,399 $0
91 0302 REHAB SRVCS & DISABILITY RSRCH, RECOVERY ACT $680,000,000 $315,570,633 $0
91 0901 ED FOR THE DISADVANTAGED, RECOVERY ACT $13,000,000,000 $4,878,580,543 $0
91 1001 SCHOOL IMPROVEMENT PRG, RECOVERY ACT $720,000,000 $0 $0
91 1401 OFC OF INSPECTOR GENERAL, RECOVERY ACT $14,000,000 $0 $0
91 1909 ST FISCAL STABILIZATION FUND, RECOV ACT $53,600,000,000 $0 $0
Total $97,407,000,000 $11,363,064,856 $0

You'll see that the total amount obligated and disbursed match what's listed on the web.  What my previous post  is trying to get at is

1) how to get an up-to-date list of all these accounts (there are 12 listed for education here, but in a tally I'm working on, there are 14)


2) what the the appropriation for each account is.  I'm happy to see the total appropriation for Dept of Ed as $97,407,000,000 — since it matches what ProPublica lists at — not to mention statements like "The American Recovery and Reinvestment Act of 2009 (ARRA) provides approximately $100 billion for education" (

Once I have an accurate list of TAFS (e.g., 91-1909 for the State fiscal stabilization fund = $53.6 billion), then I'm use that list to slot the spending data.

Does anyone know of a complete and up-to-date list of Recovery Act accounts?

Does anyone know of a complete and up-to-date list of Recovery Act TAFS — basically a list of all the basic accounts of money flowing from the Recovery Act?  There was one published by ProPublica on April 1, 2009 (from the post Falling Short of Expectations So Far – ProPublica ) and one buried in spreadsheets coming from the feds (e.g., in the worksheet entitled "92_AARP_TAFS_DD_Detail") .  I've been working on synthesizing the two lists and  updating them with the latest appropriation numbers that we can glean from scrapes of

I'm close to arriving at a list that I'm happy with.  However, this is the type of list that the feds must have, but one I've not been able to find.  Anyone know of one?

My project idea for the Freebase Hack Day

[Post in progress]

In this post, I will write about my project proposal for the upcoming Freebase HackDay.

The project is to elaborate the prototype at An org chart of the US Federal Government Based on OMB agency and bureau codes.

See what I've written at

I'm writing up a longer post right now, but let me list a few things I'd love help with:

1) to do the reconciliation of governement agenices to Freebase, I built a primitive acre app to help me apply Freebase suggest on a lot of items: — see source: and a background writeup of the idea: Refining this app would be very useful!

2) as part of the reconciliation process, coming up with a good way to figure out from the suggest API whether a given suggestion is given with high confidence or not would be helpful.  Tom Morris has some ideas in

3) writing the data back from the reconciliation would be very useful.  The data behind is — how to model the OMB codes and apply them to the government agenices in Freebase?  How about the entitites I couldn't find Freebase — should we create new entities for them?

4) Re what Spencer wrote:  yes, I'd love to see someone come up with a better visualization than what I have at — especially if there is a generic viewer.

A first pass at an org chart for the US Federal government

When I started trying to understand how the US Government works, I've been trying to find a chart that would list all the different department, agencies, and other organizational entities that comprise the government — and show how they are related to each other. I can't believe that I'd be the only person to find such an org chart useful; indeed, this idea is echoed in a project idea listed on the Sunlight Labs wiki as OPML the Federal Government:

Project Idea: This is a quick win– just create an OPML file of the existing structure of the Federal Government agencies in all branches.

As a step to creating such a representation, I've scraped the data in

Appendix C of OMB Circular No. A-11 (Sept 2008).

Under the MAX system, OMB assigns agency and bureau codes that are used to identify and access data in the budget database. The following table lists these codes in budget order. It also provides the corresponding agency codes assigned by Treasury. In certain instances, a different Treasury agency code may be used for some accounts in an agency; a complete listing can be found in the Budget Accounts Title (BAT) file.

I've uploaded this PDF to scribd to make it easier for readers to see the data the pdf has:

OMB Circular a 11 Appendix c

I read this PDF into Adobe Acrobat 8, saved it as "XML 1.0", messaged the XML a bit by hand to make it easier to apply some XQuery to create a starter OPML 1.0 file, and then did some more manual editing to represent the data in the correct hierarchy to produce:

My working assumption is that OMB Agency/Bureau codes + Treasury Agency Codes provide the key to unlocking a significant part of the higher levels of the US Federal Government. More on this assumption later.

You can see this OPML rendered by as such:

Some Possible Next Steps:

Tagged , , , ,