government

Slides for my talk on open government + Freebase

I'm looking forward to giving a little talk on open government + Freebase + Recovery Act data tonight at the Freebase meeting.   I'm deeply excited about the potential of open government data to change how we work with government (not to mention how we understand its workings).    Here are some slides that will help frame my talk tonight.

government
open data

Comments (0)

Permalink

ARRA Treasury Account Symbols: the outcome of our FOIA request

In July, I wrote about why I've been looking for Recovery TAFS and appropriations. In an attempt to get an official list from the US federal government, Eric Kansa and I sent a FOIA letter to OMB to request the release (in electronic form) of a complete and up-to-date list of all Recovery Act (ARRA) TAFS (Treasury Appropriation Fund Symbols). We had known of two out-of-date and potentially incomplete lists of the ARRA TAFS:

  1. the worksheet entitled "92_AARP_TAFS_DD_Detail" in May 8, 2009 weekly report from USAID
  2. a pdf published by ProPublica on April 1, 2009.

We specifically asked for an up-to-date Excel spreadsheet with the same columns as the worksheet "92_AARP_TAFS_DD_Detail" — but with an explanation of what each of the columns meant.  We  also encouraged the OMB to make this data available on an ongoing basis as an XML document published on the OMB website and kept up to date, with an explanation of each field.

Last week, we got what we asked for:  an Excel spreadsheet ( see Internet Archive metadata), which I've also uploaded as a Google spreadsheet.  Note the description of the spreadsheet to be found in the first sheet:

In a letter dated August 24 to OMB's Freedom of Information Officer, you requested that OMB provide you with an up-to-date Excel spreadsheet with the same columns as a worksheet you emailed on October 16. The Berk_FOIA_Data tab in this Excel file provides up-to-date information using the same columns in the file you sent. The information is up-to-date as of October 19, 2009, and shows a list of each Treasury Appropriation Fund Symbol (TAFS) associated with the Recovery Act (RA). Below is a description of each column in the Berk_FOIA_Data tab.

I've not had an opportunity to complete my analysis of the FOIA spreadsheet  and to correlate the data to the recipient reporting.   You'll note that there are 342 TAFS in the spreadsheet.  To derive a list of Treasury Account Symbols (TAS as opposed to TAFS), we concatenate the  Treasury Agency Code with the Treasury Bureau Code (separated by a '-') and bundle all  the corresponding TAFS.   See the resulting list, with a total of 313 TAS.  You'll note that a spreadsheet that lists the TAS as of Sept 13, 2009 has 309 symbols, while the HTML list on federalreporting.gov currently lists 327 TAS (along with 32 place-holder symbols).   The differences in those lists is something to nail down next.  At any rate, even something like the list of Treasury Accounts associated with the Recovery Act is more fluid than what I would have expected at this point.

One thing that has puzzled me is why there are so many TAFS with $0.00 for the treasury warrant.  You find an explanation in the FOIA spreadsheet:

Treasury Warrant is the sum that Treasury warranted to the TAFS. You can think of a warrant as being the initial deposit in a new checking account. For many of the TAFSs on the list, you can track the amounts appropriated in the law to the amount of the Treasury warrant. In some cases, however, you cannot track back to actual amounts because the funding in the law is formula based. In many cases, a TAFS has a zero in the Treasury Warrant column. The primary reason for this is that these TAFSs receive RA funds via a transfer from other TAFSs.

Hmmm.  We're going to have to understand the relevant formulas.

Acknowledgement:  A big thanks to Brian Carver for providing us valuable advice on how to formulate, draft and send a FOIA request and helping us to interpret what's happening during a FOIA process.

government
recovery.gov tracking

Comments (0)

Permalink

Web Services for Recovery.gov

Today, my colleagues Erik Wilde, Eric Kansa, and I are pleased to announce our new report "Web Services for Recovery.gov" and its companion website recovery.berkeley.edu.   Last week, the redesign of Recovery.gov was made public to much fanfare.  Recovery.gov is  the U.S. government’s official website for publicly documenting how funds from the American Recovery and Reinvestment Act of 2009 (ARRA) have been allocated and spent.   Our work  focuses on a crucial aspect of Recovery.gov that has yet to receive sufficient attention, namely, how data Recovery Act spending will be made available in machine-readable form for analysis, interpretation, and visualization  by third-party applications. In our report and in our website, we propose a reporting architecture,  created some sample feeds based on that architecture, and demonstrate how that data could be used in a simple map-based mashup.

Here are some highlights from our report, which I quote (with a bit of editing):

  • Design priorities for recovery.gov need to shift from focusing on deploying an attractive Web site toward designing ARRA web services to support reuse of data in third-party applications.
  • These services should allow any party  to receive the complete set of ARRA reporting data in a timely and easily usable manner, so that in principle, the full functionality of Recovery.gov could be replicated by a third party.
  • Our proposed architecture is based on the principles of Representational State Transfer (REST) and always attempting to use the simplest and most widely known and supported technology for any given task.
  • We recommend the feed-based dissemination of ARRA reporting data using the most widely used technologies on the Internet today: HTTP for service access, Atom for the service interface, and XML for the data provided by the service. This approach allowing access from sophisticated server-based applications or from resource-constrained devices such as mobile phones.
  • The manner which data flows from FederalReporting.gov to Recovery.gov is of critical importance. Ideally, Recovery.gov should use Web services offered by FederalReporting.gov.
  • We strongly recommend that Recovery reporting systems adopt the Atom syndication format for feeds.  Feeds represent a major positive development in making government data more open to citizen review and reuse and provide a unique ability to do so by merging utility for humans as well as machines.
  • While not formally standardized, feed autodiscovery is well supported by current browsers and could be implemented reliably with a well-defined set of implementation guidelines for Web pages offered by Recovery.gov.
  • We strongly recommend making feed paging and archiving mandatory, so that the feeds are not just a temporary way of communicating that information has become available. Instead, the feed pages should be available as persistent and permanent access points, so that accessing information via feeds can be done robustly and reliably.
  • ARRA data dissemination services should be more resource-oriented than service-oriented.  XML representations should contain links (in the form of URIs) to related data resources, thereby representing the relationships between the different concepts which are relevant for reporting.
  • The Recovery reporting schema uses many different coding systems and identifiers. Publication of resources related to some of these identifiers will be of great value.  (We list key identifiers in the report.)
  • There are many possible analyses that people may wish to perform on Recovery data,  making it difficult  to accommodate them all. Therefore, querying services should be oriented toward making machine-readable representations of data available, so that third party developers can easily populate their own analysis engines and run their own specialized algorithms on that data.

Erik Wilde has also commented on our report. We welcome and look forward to your feedback.

Finally, we are grateful to the Sunlight Foundation for a grant that helped to support this effort.

government
recovery.gov tracking
UC Berkeley

Comments (4)

Permalink

plotting data for counties on Google Maps: Part I

There is a huge amount of government and socio-economic data in general  gathered at the county level.  It would be nice to be able to plot that data on an desktop or online map (e.g., Google maps).  This morning I posted a question on the  Sunlight labs mailing list asking for some help:

I would like to display US counties on a Google map based on some  scalar value (e.g., population)  for each county and a color map that associates values to colors.  Does anyone know of a library that makes this easy to do?  (I'm interested in doing the same for other adminstrative regions, such as zip codes and congressional districts.)

(http://groups.google.com/group/Google-Maps-API/browse_frm/thread/fbc9266d4144e8fd/dbf74647b8baf8d1 contains a good discussion of the topic — and I have found other references that might be helpful,  but I have not seen the functionality I'm looking for distilled down into an easy-to-use library.)

Building a ground overlay

When I tweeted my question, I got a very helpful response from Sean Gillies:

That's a lot of polygons (3489, see http://sgillies.net/blog/870/a-more-perfect-union-continued/) to draw in the browser. Make an image layer with OpenLayers?

Sean confirmed what I was thinking that I had to compute a static image to use as an overlay — otherwise drawing 3000+ polygons with slow down Google maps prohibitively.   In fact, in many ways, I've been trying to use the approach I've seen from the demo gallery of the Google Maps API v3:   John Coryat's  ProjectedOverlay example, which "uses OverlayView to render an image inside a given bounding box (LatLngBounds) on top of the map".  (You can look at the overlay image (.png) directly and reuse ProjectedOverlay.js)

So one approach would be to calculate a png of the counties (colored appropriately), and this png would provide an efficient way to display county data.  I had started down this road a while ago — Sean's post gave me some more direct guidance in how to create a useful Python-based desktop GIS setup to be able to handle such tasks as creating my desired map in a png form.  To be honest, I've found the whole open source GIS world fairly confusing.  I bought and read part of Gary Sherman's Desktop GIS: Mapping the Planet with Open Source Tools. (Illustrated edition. Pragmatic Bookshelf, 2008) and was considering installing FWTools, GRASS GIS, and Quantum GIS. His post alerted me to OSGeo.org, and convinced me to try OSGeo4W , which is

a binary distribution of a broad set of open source geospatial software for Win32 environments (Windows XP, Vista, etc). OSGeo4W includes GDAL/OGR, GRASS, MapServer, OpenEV, uDig, QGIS as well as many other packages (about 70 as of summer 2008).

I installed OSGeo4W but have not been able to figure out the Python bindings (and hence can't yet try out the code that Sean posted).   Neither has the Python setup from FWTools 2.4.3 worked for me.  My next steps is to follow the instructions at Python Package Index : GDAL 1.6.1 to see whether I'll have better luck.

Joshua Tauberer's WMS service

Joshua Tauberer of Govtrack.us responded to my query by referring me to his experimental WMS service, which produces WMS layer for entities ranging from Congressional and state districts to counties.   I modified one of the examples that  to try to plot the counties.   For some reason, not all the counties show up yet.  Still, this approach is very promising since it would save me the work of calculating the coordinates of the county boundaries to begin with.  I have to come back to study and apply the techniques documented at WMS Server API Documentation.

Other things to study further

Google
government
mashups

Comments (1)

Permalink

I'm looking forward to Transparency Camp 2009

I'll be at TransparencyCamp 2009 tomorrow (You can follow the conference tweets at #tcamp09, whether or not you'll be in physical attendance.) Since TCamp09 is an unconference, any formal agenda will be determined at the conference by sessions attendees propose there. I'd like to see and attend sessions on the following topics:

  • projects/techniques to track the fiances of the US Government. I've been working on tracking the Recovery Act (aka the Stimulus) and would like to compare notes with others involved with understand how budgets are created, and money allocated and spent at the federal level.
  • projects/techniques on how to generate an ontology or mapping of the structures of the federal and state governments (e.g., how would we map the US Government Manual into structured machine-readable form?)
  • I'd love to hear Joshua Tauberer tell us about govtrack.us and Carl Malamud about public.resource.org
  • business/sustainability models around government transparency projects. I'd like to devote more time to government transparency, but how do we pay the bills?

government

Comments (0)

Permalink

A clarification of why I'm looking for Recovery TAFS and appropriations

In response to a question I received on a mailing list in response to my query  Does anyone know of a complete and up-to-date list of Recovery Act accounts? concerning why I was looking for amounts appropriated and not just obligated an spent for the Recovery, I wrote the following clarification (which I have edited lightly):

In addition to the amount of money that is obligated and spent, isn't there also the amount money that is appropriated?  The amount obligated and spent goes up, but isn't the appropriation supposed to be maximum that the obligated and spent amounts ever reach?  (I'm an accounting newbie, so correct me if I misunderstand what these terms mean.)  What I'm trying to understand right now are statements like "ARRA is a $787 billion dollar bill" and the Department of Education is getting a "$100 billion".   Specifically, I'd like to see how various line items add up to the totals quoted.

The amounts obligated used to be reported in the weekly excel spreadsheets from the agencies.  For example, consider the April 3 report from the Department of Ed:

http://www.recovery.gov/?q=content/weekly-report&agency_code=91&agency=&startdate=2009-04-03&noofreports=2&summarytype=&report_id=146&nex=

and the corresponding spreadsheet:

http://www.recovery.gov/sites/default/files/weeklyreport_WR20090403ED.xls

At http://www.recovery.gov/?q=content/weekly-report&agency_code=91&agency=&startdate=2009-04-03&noofreports=2&summarytype=&report_id=146&nex=, we're told that:

  • Total Available: $11,363,064,856
  • Total Paid Out: $0

The spreadsheet (specifically the "Weekly Update" worksheet) actually supports this statement — here, I copy the table and add the totals line.

Program Source/ Treasury Account Symbol: Agency Code Program Source/Treasury Account Symbol: Account Code Program Source/Treasury Account Symbol; Sub-Account Code (OPTIONAL) Program Description (Account Title) Total Appropriation Total Obligations Total Disbursements
91 0103 IMPACT AID, RECOVERY ACT $100,000,000 $0 $0
91 0196 HIGHER EDUCATION, RECOVERY ACT $100,000,000 $0 $0
91 0197 INSTITUTE OF ED SCIENCES, RECOVERY ACT $250,000,000 $0 $0
91 0198 STUDENT AID ADMIN, RECOVERY ACT $60,000,000 $0 $0
91 0199 STUDENT FINANCIAL ASST, RECOVERY ACT $16,483,000,000 $198,901,281 $0
91 0207 INNOVATION & IMPROVEMENT, RECOVERY ACT $200,000,000 $0 $0
91 0299 SPECIAL EDUCATION, RECOVERY ACT $12,200,000,000 $5,970,012,399 $0
91 0302 REHAB SRVCS & DISABILITY RSRCH, RECOVERY ACT $680,000,000 $315,570,633 $0
91 0901 ED FOR THE DISADVANTAGED, RECOVERY ACT $13,000,000,000 $4,878,580,543 $0
91 1001 SCHOOL IMPROVEMENT PRG, RECOVERY ACT $720,000,000 $0 $0
91 1401 OFC OF INSPECTOR GENERAL, RECOVERY ACT $14,000,000 $0 $0
91 1909 ST FISCAL STABILIZATION FUND, RECOV ACT $53,600,000,000 $0 $0
Total $97,407,000,000 $11,363,064,856 $0

You'll see that the total amount obligated and disbursed match what's listed on the web.  What my previous post  is trying to get at is

1) how to get an up-to-date list of all these accounts (there are 12 listed for education here, but in a tally I'm working on, there are 14)

and

2) what the the appropriation for each account is.  I'm happy to see the total appropriation for Dept of Ed as $97,407,000,000 — since it matches what ProPublica lists at http://www.propublica.org/ion/stimulus/item/recovery.gov-falling-short-of-expectations-so-far-090331 — not to mention statements like "The American Recovery and Reinvestment Act of 2009 (ARRA) provides approximately $100 billion for education" (http://www.ed.gov/policy/gen/leg/recovery/implementation.html).

Once I have an accurate list of TAFS (e.g., 91-1909 for the State fiscal stabilization fund = $53.6 billion), then I'm use that list to slot the spending data.

government
recovery.gov tracking

Comments (0)

Permalink

Does anyone know of a complete and up-to-date list of Recovery Act accounts?

Does anyone know of a complete and up-to-date list of Recovery Act TAFS — basically a list of all the basic accounts of money flowing from the Recovery Act?  There was one published by ProPublica on April 1, 2009 (from the post Recovery.gov Falling Short of Expectations So Far – ProPublica ) and one buried in spreadsheets coming from the feds (e.g., http://www.recovery.gov/sites/default/files/financial_and_activity_report_20090512USAID.xls in the worksheet entitled "92_AARP_TAFS_DD_Detail") .  I've been working on synthesizing the two lists and  updating them with the latest appropriation numbers that we can glean from scrapes of recovery.gov.

I'm close to arriving at a list that I'm happy with.  However, this is the type of list that the feds must have, but one I've not been able to find.  Anyone know of one?

government
recovery.gov tracking

Comments (0)

Permalink

A first pass at an org chart for the US Federal government

When I started trying to understand how the US Government works, I've been trying to find a chart that would list all the different department, agencies, and other organizational entities that comprise the government — and show how they are related to each other. I can't believe that I'd be the only person to find such an org chart useful; indeed, this idea is echoed in a project idea listed on the Sunlight Labs wiki as OPML the Federal Government:

Project Idea: This is a quick win– just create an OPML file of the existing structure of the Federal Government agencies in all branches.

As a step to creating such a representation, I've scraped the data in

Appendix C of OMB Circular No. A-11 (Sept 2008).http://www.whitehouse.gov/omb/circulars/a11/current_year/app_c.pdf.

Under the MAX system, OMB assigns agency and bureau codes that are used to identify and access data in the budget database. The following table lists these codes in budget order. It also provides the corresponding agency codes assigned by Treasury. In certain instances, a different Treasury agency code may be used for some accounts in an agency; a complete listing can be found in the Budget Accounts Title (BAT) file.

I've uploaded this PDF to scribd to make it easier for readers to see the data the pdf has:

Source
OMB Circular a 11 Appendix c

I read this PDF into Adobe Acrobat 8, saved it as "XML 1.0", messaged the XML a bit by hand to make it easier to apply some XQuery to create a starter OPML 1.0 file, and then did some more manual editing to represent the data in the correct hierarchy to produce:

http://labs.dataunbound.com/doc/2009/06/OMB_A_11_C.xml

My working assumption is that OMB Agency/Bureau codes + Treasury Agency Codes provide the key to unlocking a significant part of the higher levels of the US Federal Government. More on this assumption later.

You can see this OPML rendered by optimalbrowser.com as such:

Some Possible Next Steps:

government

Comments (3)

Permalink

copyright status of White House photos on Flickr?

On the "noise list" at the School of Information at Berkeley, we recently got into a discussion about the copyright status of The Official White House Photostream's Photostream on Flickr.   Some of us would agree with the argument presented on the blog of the Creative Commons (Why Did the White House Choose Attribution and not Public Domain?) that

The photos are likely in the public domain because they are works created by the federal government and not entitled to copyright protection. As you might recall, the Whitehouse.gov’s copyright notice indicates as much.

At present, Flickr doesn't allow an ordinary user to state that one of his pictures is in the public domain.    I've been waiting for CC0 to be added to list of CC licenses you can use.  Of course, the White House isn't necessarily a regular joe user,  and there is already a structure in Flickr already to handle public domain-ish photos:  the Flickr Commons with its "no known copyright restrictions" provision .    Perhaps putting White House photos in Flickr Commons won't quite work either.  Would one be able to put images produced by the US government  into the Flickr Commons in general?

BTW, one of the comments on the Creative Commons post pointed to http://www.flickr.com/people/whitehouse/ in which we find the following stipulations:

These official White House photographs are being made available for publication by news organizations and/or for personal use printing by the subject(s) of the photographs. The photographs may not be used in materials, advertisements, products, or promotions that in any way suggest approval or endorsement of the President, the First Family, or the White House.

Given that these photos are arguably in the public domain (as argued in the creativecommons.org blog post), are these stipulations legally enforceable?

copyright
creative commons
government

Comments (3)

Permalink

Congressional Oversight Panel, TARP, and Elizabeth Warren

I wish I had time to follow the TARP carefully — following the Stimulus already keeps busy enough. However, I learned a lot from Jon Stewart's April 15 interview with Elizabeth Warren the head of the Congressional Oversight Panel: Part 1 and Part 2.

government

Comments (0)

Permalink