Skip to content

Previous recommendations would say "open the data" to

As many have jumped into making recommendations on how Recovery data  should be packaged and disseminated, I'm reminded of some important previous work in this area.

The first is the ACM U.S. Public Policy Committee (USACM) Recommendations on Open Government. I have a tremendous respect for the ACM as "the world’s largest educational and scientific computing society". The ACM U.S. Public Policy Committee (USACM) "serves as the focal point for ACM's interaction with U.S. government organizations, the computing community, and the U.S. public in all matters of U.S. public policy related to information technology."   The policy statement on "open government"  first sets the context for its recommendations:

Individual citizens, companies and organizations have begun to use computers to analyze government data, often creating and sharing tools that allow others to perform their own analyses. This process can be enhanced by government policies that promote data reusability, which often can be achieved through modest technical measures. But today, various parts of governments at all levels have differing and sometimes detrimental policies toward promoting a vibrant landscape of third-party web sites and tools that can enhance the usefulness of government data.

The recommendations  "for data that is already considered public information" are:

  • Data published by the government should be in formats and approaches that promote analysis and reuse of that data.
  • Data republished by the government that has been received or stored in a machine-readable format (such as online regulatory filings) should preserve the machine-readability of that data.
  • Information should be posted so as to also be accessible to citizens with limitations and disabilities.
  • Citizens should be able to download complete datasets of regulatory, legislative or other information, or appropriately chosen subsets of that information, when it is published by government.
  • Citizens should be able to directly access government-published datasets using standard methods such as queries via an API (Application Programming Interface).
  • Government bodies publishing data online should always seek to publish using data formats that do not include executable content.
  • Published content should be digitally signed or include attestation of publication/creation date, authenticity, and integrity.

The second is a set of Open Government Data Principles formulated in October 2007  by the Open Government Working Group,  "30 open government advocates gathered to develop a set of principles of open government data":

Government data shall be considered open if they are made public in a way that complies with the principles below:

1. Complete
All public data are made available. Public data are data that are not subject to valid privacy, security or privilege limitations.
2. Primary
Data are collected at the source, with the finest possible level of granularity, not in aggregate or modified forms.
3. Timely
Data are made available as quickly as necessary to preserve the value of the data.
4. Accessible
Data are available to the widest range of users for the widest range of purposes.
5. Machine processable
Data are reasonably structured to allow automated processing.
6. Non-discriminatory
Data are available to anyone, with no requirement of registration.
7. Non-proprietary
Data are available in a format over which no entity has exclusive control.
8. License-free
Data are not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.

Compliance must be reviewable.

The final is the paper “Government Data and the Invisible Hand.” (Yale Journal of Law & Technology 11: 160.) by David Robinson, Harlan Yu, and Edward Felten.  The abstract contains the following recommendation:

Today, government bodies consider their own websites to be a higher priority than technical infrastructures that open up their data for others to use….It would be preferable for government to understand providing reusable data, rather than providing websites, as the core of its online publishing responsibility.

In  ProgrammableWeb last year, I distilled the paper's argument as follows:

The conclusion is based on a claim that the executive branch is comparatively ineffective at creating tools for presenting data and should therefore leave that work to a private sector (either nonprofit or commercial entities) that is best able to respond to a wide variety of possible uses for government data. That doesn’t mean that the government should provide no user interface to the data, but rather “should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data.” Fancier interfaces and tools should be built by others.

Moreover, the authors have recommended a specific mechanism for ensuring that the government does not privilege any user interface over their public data infrastructure: “require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large.”

Let me now make sure that these recommendations are at least referenced somewhere at the "National Dialogue" around the Recovery.

Post a Comment

You must be logged in to post a comment.