Skip to content

Typographical or semantic irregularities at

Why are there two reports with the same date? This screenshot is from the reports from the Department of Labor on

pageid/curid as a unique id for Wikipedia pages

In my learning how to program Freebase, I've come across links to the Wikipedia that make use of a curid parameter.  For example,

is the same as

At least, the two pages seem to be the same thing as far as I can see.

How to do a lookup btween curid and the page title?  One way is ff we're screen-scraping, the page source of contains

var wgArticleId = "296716";

And if you go to lots of indication of what the title is, including the permanent link (e.g.,

To dig deeper, I might want to understand the mediawiki data structure and the mediawiki API.

Tagged ,

I'm confused: how to provide the proper attribution for a CC-license photo in Freebase?

I'm puzzled by how to provide  the correct attribution to derivatives of Creative Commons licensed.  Does one have to track the entire provenace of the object?  I came across this problem when I wanted to upload a photo from the Wikipedia to Freebase.  Here's how I posed my question on the Freebase general support board:

I'd like to upload the latest photo from (e.g., to but am in a quandary about how to do the proper attribution. The photo in question is a derivative (cropping + light adjustment) of — which is licensed under a CC-BY-SA license. If I want to use the Wikipedia photo (a deriv of the one in Flickr), who do I credit as the copyright holder? The uploader of the Flickr photo? ( if so, do I enter or watchwithkristin or Kristin Dos Santos) The Wikipedia? The wikipedia user who made the last derivative?

Tagged , , ,

journalism as an antidote to information overload?

I think that there is certainly an important role for professional journalism, which can act as an invaluable filter. Overload! : CJR:

To win the war for our attention, news organizations must make themselves indispensable by producing journalism that helps make sense of the flood of information that inundates us all.

In the same issue of CJR is a call to visualize news data — Picture This : CJR in reference to the example at Metrics – In the Shadow of Foreclosures –

The paper almanac as a model for a core part of Freebase?

I just bought a copy of the 2009 New York Times Almanac last night and start wondering whether it would be a good idea to  use the almanac format as a way of structuring some basic collections of facts/information you'd want to have in Freebase.

New OMB guidelines issue for recovery tracking

I will have to get cracking on studying the new Updated Implementing Guidance for the American Recovery and Reinvestment Act of 2009, which came out last Friday, April 3. Here's the news report from on this new set of guidelines:

On April 3, 2009, the Office of Management and Budget (OMB) published Implementing Guidance for the American Recovery and Reinvestment Act of 2009 ("Recovery Act"). This is the second installment of detailed government-wide guidance for carrying out programs and activities enacted in the Recovery Act. This updated guidance supplements, amends and clarifies the initial guidance issued by OMB on February 18, 2009 (Initial Implementing Guidance for the American Recovery and Reinvestment Act of 2009, M-09-10). Updates to the guidance are based on ongoing input received from the public, Congress, state and local government officials, grant and contract recipients and federal personnel.

LLC or S or C?

This morning, I wanted to make some progress on deciding on what type of incorporation I want to pursue for my new business. I spent time looking at

  • Attorney, Stephen Fishman. Working for Yourself: Law & Taxes for Independent Contractors, Freelancers & Consultants. 7th ed. NOLO, 2008.
  • Weiss, Alan. Getting started in consulting. Getting started in. Hoboken, N.J.: J. Wiley, 2004.
  • Pakroo, Peri. Small Business Start-Up Kit for California. 7th ed. NOLO, 2008.

My current conclusion: yes, incorporate, but I will need to consult an accountant  and an attorney to figure out which form to establish.  Factors that will play a role:

  • which is the best for a small business starting out — I don't want too much overhead
  • we need to be able to move between CA and PA
  • we need a way to handle health insurance issues
  • we need room for growth
  • how does business insurance figure in the mix

It's worth studying Fishman more closely to understand the tax differences between LLC, S, and C.

Wilde, Kansa, and Yee "Proposed Guideline Clarifications for American Recovery and Reinvestment Act of 2009"

Earlier in the week, Erik Wilde, Eric Kansa, and I published our technical report Proposed Guideline Clarifications for American Recovery and Reinvestment Act of 2009, a set of technical guidelines for how we think should publish data about how stimulus money is being spent and a prototype of what people can do with the data if data were published accordingly.  Here's the abstract of our report:

The Initial Implementing Guidance for the American Recovery and Reinvestment Act of 2009 provides guidance for a feed-based information dissemination architecture. In this report, we suggest some improvements and refinements of the initial guidelines, in the hope of paving the path for a more transparent and useful feed-based architecture. This report is meant as a preliminary guide to how the current guidelines could be made more specific and provide better guidance for providers and consumers of Recovery Act spending information. It is by no means intended as a complete or final set of recommendations.

The technical heart of the work would be the XSD schemas for communications, formula block grant allocation, and weekly report feeds. But the most fun part is to looking at how some fake data would appear, displayed in a mashup of Google Maps and Simile Timeline. We made up some data because at the time of analysis, there wasn't much in the way of real government data to use. We hope that situation will change soon.

What next for me on this front?  Revisiting the questions I posed in my March 7 post (Some questions about the implementation guidelines for the recovery feeds) to see whether I can now answer them.

working with the bioguide ID for congressperson in Freebase

The Congressional Biographical Directory contains entries for every congressperson from 1774 to the present.  Each congressional representative is associated with an identifier (a bioguide ID).  For example, the bioguide ID for Edward (Ted) Kennedy is K000105.  With this ID, you can determine the URL for the coresponding biographical directory — e.g., Kennedy's is

I would like to make use of the bioguide ID in interacting with Freebase with respect to congresspeople.

hit explore: to see

Outbound key(s):

key namespace
184136 /wikipedia/en_id
Ted_Kennedy /wikipedia/en
Edward_M$002E_Kennedy /wikipedia/en
Edward_Moore_Kennedy /wikipedia/en
Teddy_Kennedy /wikipedia/en
Edward_kennedy /wikipedia/en
Edward_M_Kennedy /wikipedia/en
EMK /wikipedia/en
Ed_Kennedy /wikipedia/en
Caroline_Bilodeau /wikipedia/en
aa1a62ca-f027-426e-810f-63556da55434 /authority/musicbrainz
ARTIST349855 /authority/musicbrainz/name
Edward_Kennedy /wikipedia/en
Ted_Kennedy$002FDraft_1 /wikipedia/en
Senator_Ted_Kennedy /wikipedia/en
ted_kennedy /en
The_Lion_of_the_Senate /wikipedia/en
Edward_Moore_$0022Ted$0022_Kennedy /wikipedia/en
K000105 /user/jamie/sunlight/bioguide_id
Cape_Cod_Orca /wikipedia/en

What's the MQL query to read all the keys for the topic?

  "id" : "/en/ted_kennedy",
  "key" : [

we get among the various keys

  "namespace" : "/user/jamie/sunlight/bioguide_id",
  "type" : "/type/key",
  "value" : "K000105"

Keys are new to me — so I need to do a bit of learning right now.   Now, let's note the following

Let's now figure out how to write the bioguide ID for one of the senators without the bioguide ID:  Jeanne Shaheen facts – Freebase. Her bioguide_id is S001181. Here's a MQL write query that writes the bioguide_id to Freebase:

  "id" : "/en/jeanne_shaheen",
  "key" : {
    "connect" : "insert",
    "namespace" : "/user/jamie/sunlight/bioguide_id",
    "type" : "/type/key",
    "value" : "S001181"

Things to figure out:  how to create keys in the first place in the freebase UI and in MQL.  I think regular users can create keys but I'm not aware of how to do so in the Freebase UI.  I didn't even see a way to insert the bioguide_id using the Freebase UI.

Tagged ,

Some questions about the implementation guidelines for the recovery feeds

A project that Erik Wilde and Eric Kansa (colleagues at the School of Information at Berkeley) and I have started tackling is tracking the flow of money from the Stimulus Package (aka the  American Recovery and Reinvestment Act of 2009).  The Obama Administration has set up to "feature information on how the Act is working, tools to help you hold the government accountable, and up-to-date data on the expenditure of funds." refers to some implementation details:

To meet these objectives, the President is directing Federal agencies to take critical steps in preparation for the Act’s implementation.  See here, for the White House’s February 9, 2009 initial implementation memorandum and February 18 detailed guidance memorandum.

The detailed memorandum (p. 56) issues the following requirement involving the use of web feeds :

For each of the near term reporting requirements (major communications, formula block grant allocations, weekly reports) agencies are required to provide a feed (preferred: Atom 1.0, acceptable: RSS) of the information so that content can be delivered via subscription.

Erik has been leading our efforts in making sense of the implementation memoranda — you can track his findings in a series of blog posts (listed in reverse chronological order, of course):

As we've read through the memoranda, we've been confused by a variety of matters, ones which we can hope others can help us with:

  1. On p. 56,  you find "Note that the body of the email should include the appropriate completed template as an attachment and should include the name, title, and contact information for the submitter. Templates for these files can be found at"  The URL requires a password to access.  Assuming that the template is not some state secret, can someone make those templates available to the public?
  2. Does anyone know of a way to get a list of all "federal block grant programs" relevant to the Stimulus Package?
  3. How to get a complete list of  CFDA Program Number?  One way is to go to and download the latest 2000+ page pdf catalog and scrape the list.  Is there an easier way?
  4. How to get a list of all Treasury Account Symbols (TAS)?  I think I found a list at, specifically the FAST book (in Word format).  I looked for one of the TAS mentioned in a  stimulus feed from the DOJ 15-0402-OJP and couldn't find that TAS specifically.  I'm not surprised since this might be a new TAS — but should we expect to start seeing the new TAS in the FAST book?
  5. Anyone help us decipher Treasury Appropriation Fund Symbols (TAFS). TAFS seem to be a primary mechanism for agencies to distinguish recovery vs non-recovery spending in its reporting. On p. 6, we read  "Agencies must establish unique Treasury Appropriation Fund Symbols (TAFSs) in their financial systems for all Recovery Act funding, unless a waiver is granted by the Director of OMB by February 25th."   Can we get help to get that list of TAFSs?  On p. 32, we have "OMB will post the list of TAFSs on the Budget Execution and Recovery Funding page of the Budget Community; the URL is "  Is there any reason that list should be available only to government employees?
  6. Can someone show how to map the detailed list of spending items in the Stimulus package (such as the analysis at ProPublica) to Treasury Account Symbols?

Any help with any of these questions would be greatly appreciated!