Data Unbound

Helping organizations access and share data effectively. Special focus on web APIs for data integration.

November 23rd, 2009

ARRA Treasury Account Symbols: the outcome of our FOIA request

In July, I wrote about why I've been looking for Recovery TAFS and appropriations. In an attempt to get an official list from the US federal government, Eric Kansa and I sent a FOIA letter to OMB to request the release (in electronic form) of a complete and up-to-date list of all Recovery Act (ARRA) TAFS (Treasury Appropriation Fund Symbols). We had known of two out-of-date and potentially incomplete lists of the ARRA TAFS:

  1. the worksheet entitled "92_AARP_TAFS_DD_Detail" in May 8, 2009 weekly report from USAID
  2. a pdf published by ProPublica on April 1, 2009.

We specifically asked for an up-to-date Excel spreadsheet with the same columns as the worksheet "92_AARP_TAFS_DD_Detail" — but with an explanation of what each of the columns meant.  We  also encouraged the OMB to make this data available on an ongoing basis as an XML document published on the OMB website and kept up to date, with an explanation of each field.

Last week, we got what we asked for:  an Excel spreadsheet ( see Internet Archive metadata), which I've also uploaded as a Google spreadsheet.  Note the description of the spreadsheet to be found in the first sheet:

In a letter dated August 24 to OMB's Freedom of Information Officer, you requested that OMB provide you with an up-to-date Excel spreadsheet with the same columns as a worksheet you emailed on October 16. The Berk_FOIA_Data tab in this Excel file provides up-to-date information using the same columns in the file you sent. The information is up-to-date as of October 19, 2009, and shows a list of each Treasury Appropriation Fund Symbol (TAFS) associated with the Recovery Act (RA). Below is a description of each column in the Berk_FOIA_Data tab.

I've not had an opportunity to complete my analysis of the FOIA spreadsheet  and to correlate the data to the recipient reporting.   You'll note that there are 342 TAFS in the spreadsheet.  To derive a list of Treasury Account Symbols (TAS as opposed to TAFS), we concatenate the  Treasury Agency Code with the Treasury Bureau Code (separated by a '-') and bundle all  the corresponding TAFS.   See the resulting list, with a total of 313 TAS.  You'll note that a spreadsheet that lists the TAS as of Sept 13, 2009 has 309 symbols, while the HTML list on federalreporting.gov currently lists 327 TAS (along with 32 place-holder symbols).   The differences in those lists is something to nail down next.  At any rate, even something like the list of Treasury Accounts associated with the Recovery Act is more fluid than what I would have expected at this point.

One thing that has puzzled me is why there are so many TAFS with $0.00 for the treasury warrant.  You find an explanation in the FOIA spreadsheet:

Treasury Warrant is the sum that Treasury warranted to the TAFS. You can think of a warrant as being the initial deposit in a new checking account. For many of the TAFSs on the list, you can track the amounts appropriated in the law to the amount of the Treasury warrant. In some cases, however, you cannot track back to actual amounts because the funding in the law is formula based. In many cases, a TAFS has a zero in the Treasury Warrant column. The primary reason for this is that these TAFSs receive RA funds via a transfer from other TAFSs.

Hmmm.  We're going to have to understand the relevant formulas.

Acknowledgement:  A big thanks to Brian Carver for providing us valuable advice on how to formulate, draft and send a FOIA request and helping us to interpret what's happening during a FOIA process.

October 5th, 2009

Web Services for Recovery.gov

Today, my colleagues Erik Wilde, Eric Kansa, and I are pleased to announce our new report "Web Services for Recovery.gov" and its companion website recovery.berkeley.edu.   Last week, the redesign of Recovery.gov was made public to much fanfare.  Recovery.gov is  the U.S. government’s official website for publicly documenting how funds from the American Recovery and Reinvestment Act of 2009 (ARRA) have been allocated and spent.   Our work  focuses on a crucial aspect of Recovery.gov that has yet to receive sufficient attention, namely, how data Recovery Act spending will be made available in machine-readable form for analysis, interpretation, and visualization  by third-party applications. In our report and in our website, we propose a reporting architecture,  created some sample feeds based on that architecture, and demonstrate how that data could be used in a simple map-based mashup.

Here are some highlights from our report, which I quote (with a bit of editing):

  • Design priorities for recovery.gov need to shift from focusing on deploying an attractive Web site toward designing ARRA web services to support reuse of data in third-party applications.
  • These services should allow any party  to receive the complete set of ARRA reporting data in a timely and easily usable manner, so that in principle, the full functionality of Recovery.gov could be replicated by a third party.
  • Our proposed architecture is based on the principles of Representational State Transfer (REST) and always attempting to use the simplest and most widely known and supported technology for any given task.
  • We recommend the feed-based dissemination of ARRA reporting data using the most widely used technologies on the Internet today: HTTP for service access, Atom for the service interface, and XML for the data provided by the service. This approach allowing access from sophisticated server-based applications or from resource-constrained devices such as mobile phones.
  • The manner which data flows from FederalReporting.gov to Recovery.gov is of critical importance. Ideally, Recovery.gov should use Web services offered by FederalReporting.gov.
  • We strongly recommend that Recovery reporting systems adopt the Atom syndication format for feeds.  Feeds represent a major positive development in making government data more open to citizen review and reuse and provide a unique ability to do so by merging utility for humans as well as machines.
  • While not formally standardized, feed autodiscovery is well supported by current browsers and could be implemented reliably with a well-defined set of implementation guidelines for Web pages offered by Recovery.gov.
  • We strongly recommend making feed paging and archiving mandatory, so that the feeds are not just a temporary way of communicating that information has become available. Instead, the feed pages should be available as persistent and permanent access points, so that accessing information via feeds can be done robustly and reliably.
  • ARRA data dissemination services should be more resource-oriented than service-oriented.  XML representations should contain links (in the form of URIs) to related data resources, thereby representing the relationships between the different concepts which are relevant for reporting.
  • The Recovery reporting schema uses many different coding systems and identifiers. Publication of resources related to some of these identifiers will be of great value.  (We list key identifiers in the report.)
  • There are many possible analyses that people may wish to perform on Recovery data,  making it difficult  to accommodate them all. Therefore, querying services should be oriented toward making machine-readable representations of data available, so that third party developers can easily populate their own analysis engines and run their own specialized algorithms on that data.

Erik Wilde has also commented on our report. We welcome and look forward to your feedback.

Finally, we are grateful to the Sunlight Foundation for a grant that helped to support this effort.

July 29th, 2009

A clarification of why I'm looking for Recovery TAFS and appropriations

In response to a question I received on a mailing list in response to my query  Does anyone know of a complete and up-to-date list of Recovery Act accounts? concerning why I was looking for amounts appropriated and not just obligated an spent for the Recovery, I wrote the following clarification (which I have edited lightly):

In addition to the amount of money that is obligated and spent, isn't there also the amount money that is appropriated?  The amount obligated and spent goes up, but isn't the appropriation supposed to be maximum that the obligated and spent amounts ever reach?  (I'm an accounting newbie, so correct me if I misunderstand what these terms mean.)  What I'm trying to understand right now are statements like "ARRA is a $787 billion dollar bill" and the Department of Education is getting a "$100 billion".   Specifically, I'd like to see how various line items add up to the totals quoted.

The amounts obligated used to be reported in the weekly excel spreadsheets from the agencies.  For example, consider the April 3 report from the Department of Ed:

http://www.recovery.gov/?q=content/weekly-report&agency_code=91&agency=&startdate=2009-04-03&noofreports=2&summarytype=&report_id=146&nex=

and the corresponding spreadsheet:

http://www.recovery.gov/sites/default/files/weeklyreport_WR20090403ED.xls

At http://www.recovery.gov/?q=content/weekly-report&agency_code=91&agency=&startdate=2009-04-03&noofreports=2&summarytype=&report_id=146&nex=, we're told that:

  • Total Available: $11,363,064,856
  • Total Paid Out: $0

The spreadsheet (specifically the "Weekly Update" worksheet) actually supports this statement — here, I copy the table and add the totals line.

Program Source/ Treasury Account Symbol: Agency Code Program Source/Treasury Account Symbol: Account Code Program Source/Treasury Account Symbol; Sub-Account Code (OPTIONAL) Program Description (Account Title) Total Appropriation Total Obligations Total Disbursements
91 0103 IMPACT AID, RECOVERY ACT $100,000,000 $0 $0
91 0196 HIGHER EDUCATION, RECOVERY ACT $100,000,000 $0 $0
91 0197 INSTITUTE OF ED SCIENCES, RECOVERY ACT $250,000,000 $0 $0
91 0198 STUDENT AID ADMIN, RECOVERY ACT $60,000,000 $0 $0
91 0199 STUDENT FINANCIAL ASST, RECOVERY ACT $16,483,000,000 $198,901,281 $0
91 0207 INNOVATION & IMPROVEMENT, RECOVERY ACT $200,000,000 $0 $0
91 0299 SPECIAL EDUCATION, RECOVERY ACT $12,200,000,000 $5,970,012,399 $0
91 0302 REHAB SRVCS & DISABILITY RSRCH, RECOVERY ACT $680,000,000 $315,570,633 $0
91 0901 ED FOR THE DISADVANTAGED, RECOVERY ACT $13,000,000,000 $4,878,580,543 $0
91 1001 SCHOOL IMPROVEMENT PRG, RECOVERY ACT $720,000,000 $0 $0
91 1401 OFC OF INSPECTOR GENERAL, RECOVERY ACT $14,000,000 $0 $0
91 1909 ST FISCAL STABILIZATION FUND, RECOV ACT $53,600,000,000 $0 $0
Total $97,407,000,000 $11,363,064,856 $0

You'll see that the total amount obligated and disbursed match what's listed on the web.  What my previous post  is trying to get at is

1) how to get an up-to-date list of all these accounts (there are 12 listed for education here, but in a tally I'm working on, there are 14)

and

2) what the the appropriation for each account is.  I'm happy to see the total appropriation for Dept of Ed as $97,407,000,000 — since it matches what ProPublica lists at http://www.propublica.org/ion/stimulus/item/recovery.gov-falling-short-of-expectations-so-far-090331 — not to mention statements like "The American Recovery and Reinvestment Act of 2009 (ARRA) provides approximately $100 billion for education" (http://www.ed.gov/policy/gen/leg/recovery/implementation.html).

Once I have an accurate list of TAFS (e.g., 91-1909 for the State fiscal stabilization fund = $53.6 billion), then I'm use that list to slot the spending data.

July 29th, 2009

Does anyone know of a complete and up-to-date list of Recovery Act accounts?

Does anyone know of a complete and up-to-date list of Recovery Act TAFS — basically a list of all the basic accounts of money flowing from the Recovery Act?  There was one published by ProPublica on April 1, 2009 (from the post Recovery.gov Falling Short of Expectations So Far – ProPublica ) and one buried in spreadsheets coming from the feds (e.g., http://www.recovery.gov/sites/default/files/financial_and_activity_report_20090512USAID.xls in the worksheet entitled "92_AARP_TAFS_DD_Detail") .  I've been working on synthesizing the two lists and  updating them with the latest appropriation numbers that we can glean from scrapes of recovery.gov.

I'm close to arriving at a list that I'm happy with.  However, this is the type of list that the feds must have, but one I've not been able to find.  Anyone know of one?

April 28th, 2009

Participating in the national online dialogue around recovery.gov

Yesterday, I wrote a story on ProgrammableWeb (An Online Dialogue to Shape Recovery.gov) to educate readers on recovery.gov (the government website aimed to let American track the spending of money arising from the  American Recovery and Reinvestment Act of 2009 — the "Stimulus Package")   and to draw attention to a “national dialogue” this week (until May 3) to solicit ideas aimed at answering the key question:

What ideas, tools, and approaches can make Recovery.gov a place where all citizens can transparently monitor the expenditure and use of recovery funds?

I've been reading some of the ideas presented so far and voted on a couple.  I added comments to two so far.   In response to the proposal XML Web Services ("Make recovery data available as a web service via SOAP XML."), I wrote:

I agree that some type of rigorous programmatic interface that allows developers to access the data from recovery.gov is essential. I think that SOAP and associated the rest of WS-* stack might be one way to implement such access mechanisms, but I would not want SOAP to the exclusive protocol used. I would argue, for instance, that a RESTful approach is also an excellent alternative to consider for recovery.gov.

On a front closer to what our work has been about, in response to Making stimulus spending data accessible to the public, I wrote

I'm one of the Berkeley researchers mentioned above involved with making recommendations on how data feeds should be use to make the recovery more transparent (see http://www.ischool.berkeley.edu/newsandevents/news/20090417recoveryguidelines and http://isd.ischool.berkeley.edu/stimulus/2009-029/)

Although some (but not all) agencies receiving and dispersing recovery funds are using feeds in their reporting (see a list that we compiled at http://isd.ischool.berkeley.edu/stimulus/feeds/feeds.html), the best data on dollars appropriated, obligated, or spent is in the Excel spreadsheets. Although there are apparently templates for the reports, they keep changing format and there's nothing to stop agencies from inserting extra fields or omitting other fields. We know this for a fact since we've written programs to scrape the data from the spreadsheets and find it a challenge to keep up with changes that keep breaking our scripts.

The federal government should made the data in the form of XML feeds in the first place (backed by a schema so that we can check that the data is valid), instead of making people who want to use that data scrape it out of Excel in a highly fragile process.

As I wrote yesterday, it will be interesting to see how well the recovery.gov site actually does at aggregating a large number of proposals and surfacing the best ones. Moreover,

April 28th, 2009

Tracking the stimulus/recovery in the news

Over the last couple of months, I've been studying the Stimulus through the lens of the weekly reports published on recovery.gov.   My colleagues Erik Wilde and Eric Kansa (at the School of Information at UC Berkeley) and I  made recommendations on how data feeds should be used to foster transparency around stimulus data,  in addition to developing prototypes of the types of visualizations one could do with such data feeds.   We're continuing work on that front, specifically scraping data currently found in Excel and transforming that data into XML (Atom) feeds.

It is much easier to transform the financial data into visualizations and analyses, once it is in the form of feeds (rather than Excel).   The federal government should made the data in the form of  XML in the first place (backed by a schema so that we can check that the data is valid),  instead of making people who want to use that data scrape the data out of Excel in a highly fragile process.

To discern the meaning of the data we are extracting from various government sites,  I am now trying to keep up with the news around the recovery.  Here are some of the sources I've been tracking so far:

This list represents my current starting points.  I naturally expect to find a lot of other useful sources as I go along.

April 6th, 2009

New OMB guidelines issue for recovery tracking

I will have to get cracking on studying the new Updated Implementing Guidance for the American Recovery and Reinvestment Act of 2009, which came out last Friday, April 3. Here's the news report from recovery.gov on this new set of guidelines:

On April 3, 2009, the Office of Management and Budget (OMB) published Implementing Guidance for the American Recovery and Reinvestment Act of 2009 ("Recovery Act"). This is the second installment of detailed government-wide guidance for carrying out programs and activities enacted in the Recovery Act. This updated guidance supplements, amends and clarifies the initial guidance issued by OMB on February 18, 2009 (Initial Implementing Guidance for the American Recovery and Reinvestment Act of 2009, M-09-10). Updates to the guidance are based on ongoing input received from the public, Congress, state and local government officials, grant and contract recipients and federal personnel.

March 7th, 2009

Some questions about the implementation guidelines for the recovery feeds

A project that Erik Wilde and Eric Kansa (colleagues at the School of Information at Berkeley) and I have started tackling is tracking the flow of money from the Stimulus Package (aka the  American Recovery and Reinvestment Act of 2009).  The Obama Administration has set up recovery.gov to "feature information on how the Act is working, tools to help you hold the government accountable, and up-to-date data on the expenditure of funds."   Recovery.gov refers to some implementation details:

To meet these objectives, the President is directing Federal agencies to take critical steps in preparation for the Act’s implementation.  See here, for the White House’s February 9, 2009 initial implementation memorandum and February 18 detailed guidance memorandum.

The detailed memorandum (p. 56) issues the following requirement involving the use of web feeds :

For each of the near term reporting requirements (major communications, formula block grant allocations, weekly reports) agencies are required to provide a feed (preferred: Atom 1.0, acceptable: RSS) of the information so that content can be delivered via subscription.

Erik has been leading our efforts in making sense of the implementation memoranda — you can track his findings in a series of blog posts (listed in reverse chronological order, of course):

As we've read through the memoranda, we've been confused by a variety of matters, ones which we can hope others can help us with:

  1. On p. 56,  you find "Note that the body of the email should include the appropriate completed template as an attachment and should include the name, title, and contact information for the submitter. Templates for these files can be found at https://max.omb.gov/community/x/doC2Dw"  The URL requires a password to access.  Assuming that the template is not some state secret, can someone make those templates available to the public?
  2. Does anyone know of a way to get a list of all "federal block grant programs" relevant to the Stimulus Package?
  3. How to get a complete list of  CFDA Program Number?  One way is to go to http://www.cfda.gov/ and download the latest 2000+ page pdf catalog and scrape the list.  Is there an easier way?
  4. How to get a list of all Treasury Account Symbols (TAS)?  I think I found a list at http://www.fms.treas.gov/fastbook/, specifically the FAST book (in Word format).  I looked for one of the TAS mentioned in a  stimulus feed from the DOJ 15-0402-OJP and couldn't find that TAS specifically.  I'm not surprised since this might be a new TAS — but should we expect to start seeing the new TAS in the FAST book?
  5. Anyone help us decipher Treasury Appropriation Fund Symbols (TAFS). TAFS seem to be a primary mechanism for agencies to distinguish recovery vs non-recovery spending in its reporting. On p. 6, we read  "Agencies must establish unique Treasury Appropriation Fund Symbols (TAFSs) in their financial systems for all Recovery Act funding, unless a waiver is granted by the Director of OMB by February 25th."   Can we get help to get that list of TAFSs?  On p. 32, we have "OMB will post the list of TAFSs on the Budget Execution and Recovery Funding page of the Budget Community; the URL is https://max.omb.gov/community/x/-4BeDw "  Is there any reason that list should be available only to government employees?
  6. Can someone show how to map the detailed list of spending items in the Stimulus package (such as the analysis at ProPublica) to Treasury Account Symbols?

Any help with any of these questions would be greatly appreciated!

|