Skip to content

Fine-tuning a Python wrapper for the hypothes.is web API and other #ianno17 followup

In anticipation of #ianno17 Hack Day, I wrote about my plans for the event, one of which was to revisit my own Python wrapper for the nascent hypothes.is web API.

Instead of spending much time on my own wrapper, I spent most of the day working with Jon Udell's wrapper for the API. I've been working on my own revisions of the library but haven't yet incorporated Jon's latest changes.

One nice little piece of the puzzle is that I learned how to introduce retries and exponential backoff into the library, thanks to a hint from Nick Stenning and a nice answer on Stackoverflow .

Other matters

In addition to the Python wrapper, there are other pieces of follow-up for me. I hope to write more extensively on those matters down the road but simply note those topics for the moment.

Videos from the conference

I might start by watching videos from #ianno17 conference: I Annotate 2017 – YouTube. Because I didn't attend the conference per se, I might glean insight into two particular topics of interest to me (the role of page owner in annotations and the intermingling of annotations in ebooks.)

An extension for embedding selectors in the URL

I will study and try Treora/precise-links: Browser extension to support Web Annotation Selectors in URIs. I've noticed that the same annotation is shown in two related forms:

Does the precise-links extension let me write the selectors into the URL?

Revisiting hypothes.is at I Annotate 2017

I'm looking forward to hacking on web and epub annotation at the #ianno17 Hack Day. I won't be at the I Annotate 2017 conference per se but will be curious to see what comes out of the annual conference.

I continue to have high hopes for digital annotations, both on the Web and in non-web digital contexts. I have used Hypothesis on and off since Oct 2013. My experiences so far:

  • I like the ability to highlight and comment on very granular sections of articles for comment, something the hypothes.is annotation tool makes easy to do. I appreciate being able to share annotation/highlight with others (on Twitter or Facebook), though I'm pretty sure most people who bother to click on the links might wonder "what's this" when they click on the link. A small user request: hypothes.is should allow a user to better customize the Facebook preview image for the annotation.
  • I've enjoyed using hypothes.is for code review on top of GitHub. (Exactly how hypothes.is complements the extensive code-commenting functionality in GitHub might be worth a future blog post.)

My Plans for Hack Day

Python wrapper for hypothes.is

This week, I plan to revisit rdhyee/hypothesisapi: A Python wrapper for the nascent hypothes.is web API to update or abandon it in favor of new developments. (For example, I should look at kshaffer/pypothesis: Python scripts for interacting with the hypothes.is API.)

Epubs + annotations

I want to figure out the state of art for epubs and annotations. I'm happy to see the announcement of a partnership to bring open annotation to eBooks from March 2017. I'd definitely like to figure out how to annotate epubs (e.g., Oral Literature in Africa (at unglue.it) or Moby Dick). The best approach is probably for me to wait until summer at which time we'll see the fruits of the partnership:

Together, our goal is to complete a working integration of Hypothesis with both EPUB frameworks by Summer 2017. NYU plans to deploy the ReadiumJS implementation in the NYU Press Enhanced Networked Monographs site as a first use case. Based on lessons learned in the NYU deployment, we expect to see wider integration of annotation capabilities in eBooks as EPUB uptake continues to grow.

In the meantime, I can catch up on the current state of futurepress/epub.js: Enhanced eBooks in the browser., grok Epub CFI Updates, and relearn how to parse epubs using Python (e.g., rdhyee/epub_avant_garde: an experiment to apply ideas from https://github.com/sandersk/ebook_avant_garde to arbitrary epubs).

Role of page owners

I plan to check in on what's going on with efforts at Hypothes.is to involve owners in page annotations:

In the past months we launched a small research initiative to gather different points of view about website publishers and authors consent to annotation. Our goal was to identify different paths forward taking into account the perspectives of publishers, engineers, developers and people working on abuse and harassment issues. We have published a first summary of our discussion on our blog post about involving page owners in annotation.

I was reminded of these efforts after reading that Audrey Watters had blocked annotation services like hypothes.is and genius from her domains:

In the spirit of communal conversation, I threw in my two cents:

Have there been any serious exploration of easy opt-out mechanisms for domain owners? Something like robots.txt for annotation tools?

My thoughts about Fargo.io using fargo.io

Organizing Your Life With Python: a submission for PyCon 2015?

I have penciled into my calendar a trip  to Montreal to attend PyCon 2014.   In my moments of suboptimal planning, I wrote an overly ambitious abstract for a talk or poster session I was planning to submit.  As I sat down this morning to meet the deadline for submitting a proposal for a poster session (Nov 1), I once again encountered the ominous (but for me, definitive) admonition:

Avoid presenting a proposal for code that is far from completion. The program committee is very skeptical of "conference-driven development".

It's true: my efforts to organize my life with Python are in the early stages. I hope that I'll be able to write something like the following for PyCon 2015.

Organizing Your Life with Python

David Allen's Getting Things Done (GTD) system is a popular system for personal productivity. Although GTD can be implemented without any computer technology, I have pursued two different digital implementations, including my current implementation using Evernote, the popular note-taking program. This talk explores using Python in conjunction with the Evernote API to implement GTD on top of Evernote. I have found that a major practical hinderance for using GTD is that it way too easy to commit to too many projects. I will discuss how to combine Evernote, Python, GTD with concepts from Personal Kanban to solve this problem.

Addendum: Whoops…I find it embarrassing that I already quoted my abstract in a previous blog post in September that I had forgotten about. Oh well. Where's my fully functioning organization system when I need it!

Tagged ,

Current Status of Data Unbound LLC in Pennsylvania

I'm currently in the process of closing down Data Unbound LLC in Pennsylvania.  I submitted the paperwork to dissolve the legal entity in April 2013 and have been amazed to learn that it may take up to a year to get the final approval done.  In the meantime, as I establishing a similar California legal entity, I will certainly continue to write on this blog about APIs, mashups, and open data.

Must Get Cracking on Organizing Your Life with Python

Talk and tutorial proposals for PyCon 2014 are due tomorrow (9/15) .  I was considering submitting a proposal until I took the heart the appropriate admonition against "conference-driven" development of the program committee.   I will nonetheless use the Oct 15 and Nov 1 deadlines for lightning talks and proposals respectively to judge whether to submit a refinement of the following proposal idea:

Organizing Your Life with Python

David Allen's Getting Things Done (GTD) system is a popular system for personal productivity.  Although GTD can be implemented without any computer technology, I have pursued two different digital implementations, including my current implementation using Evernote, the popular note-taking program.  This talk explores using Python in conjunction with the Evernote API to implement GTD on top of Evernote. I have found that a major practical hinderance for using GTD is that it way too easy to commit to too many projects.  I will discuss how to combine Evernote, Python, GTD with concepts from Personal Kanban to solve this problem.

 

Embedding Github gists in WordPress

As I gear up I to write more about programming, I have installed the Embed GitHub Gist plugin. So by writing

[gist id=5625043]

in the text of this post, I can embed https://gist.github.com/rdhyee/5625043 into the post to get:

Tagged ,

Working with Open Data

I'm very excited to be teaching a new course Working with Open Data at the UC Berkeley School of Information in the Spring 2013 semester:

Open data — data that is free for use, reuse, and redistribution — is an intellectual treasure-trove that has given rise to many unexpected and often fruitful applications. In this course, students will 1) learn how to access, visualize, clean, interpret, and share data, especially open data, using Python, Python-based libraries, and supplementary computational frameworks and 2) understand the theoretical underpinnings of open data and their connections to implementations in the physical and life sciences, government, social sciences, and journalism.

 

A mundane task: updating a config file to retain old settings

I want to have a hand in creating an excellent personal information manager (PIM) that can be a worthy successor to Ecco Pro. So far, running EccoExt (a clever and expansive hack of Ecco Pro) has been a eminently practical solution.   You can download the most recent version of this actively developed extension from the files section of the ecco_pro Yahoo! group.   I would do so regularly but one of the painful problems with unpacking (using unrar) the new files is that there wasn't an updater that would retain the configuration options of the existing setup.  So a mundane but happy-making programming task of this afternoon was to write a Python script to do exact that function, making use of the builtin ConfigParser library.
"""
compare eccoext.ini files

My goal is to edit the new file so that any overlapping values take on the current value

"""
current_file_path = "/private/tmp/14868/C/Program Files/ECCO/eccoext.ini"
new_file_path = "/private/tmp/14868/C/utils/eccoext.ini"
updated_file = "/private/tmp/14868/C/utils/updated_eccoext.ini"

# extract the key value pairs in both files to compare  the two

# http://docs.python.org/library/configparser.html
import ConfigParser

def extract_values(fname):
    # generate a parsed configuration object, set of (section, options)
    config = ConfigParser.SafeConfigParser()
    options_set = set()

    config.read(fname)
    sections = config.sections()
    for section in sections:
        options = config.options(section)
        for option in options:
            #value = config.get(section,option)
            options_set.add((section,option))

    return (config, options_set)

# process current file and new file

(current_config, current_options) = extract_values(current_file_path)
(new_config, new_options) = extract_values(new_file_path)

# what are the overlapping options
overlapping_options = current_options & new_options

# figure out which of the overlapping options are the values different

for (section,option) in overlapping_options:
    current_value = current_config.get(section,option)
    new_value = new_config.get(section,option)
    if current_value != new_value:
        print section, option, current_value, new_value
        new_config.set(section,option,current_value)

# write the updated config file

with open(updated_file, 'wb') as configfile:
    new_config.write(configfile)

MITH API workshop

I'm excited about the upcoming MITH API Workshop to be held in two weeks from Feb 25-26 at UMD :

The Maryland Institute for Technology in the Humanities will host a two-day workshop on developing APIs (Application Programming Interfaces) for the digital humanities. The workshop will gather 40-50 digital humanities scholars and developers, who along with industry leaders will demonstrate their APIs during this “working weekend.” We will discuss ways that existing and future APIs could be leveraged for digital humanities projects.

As someone who has been fascinated by APIs for years, I hope to learn a lot from my fellow digital humanities about what they care about. One of my tasks is to give an introductory talk about APIs.  What do I want to cover?  I'm still working out the exact structure, but the following topics come to mind:

  • What are APIs.  The relationship between web APIs (the focus of our workshop, I believe) and other APIs
  • How to learn more about APIs
  • APIs of specific interest to the digital humanities, with specific references to Freebase, Google geo-APIs, and OpenLibrary (organizations represented by fellow presenters)
  • Why does REST matter. (I'll only anticipate what fellow speaker Peter Keane will be bring up in his talk about REST)
  • How to consume APIs; What are mashups
  • How to deploy APIs
  • Open questions I think about

Stay tuned. Over the next two weeks, I'll work through these topics for myself (writing on this blog). I'll take this time as an opportunity to revisit what I wrote  in Pro Web 2.0 Mashups:  Remixing Data and Web Services and what I taught in my Mixing and Remixing Information course I taught at UC Berkeley over five years.