Skip to content

Adding all tabs to Zotero Version 2 — scraping translatable sites

In a previous post adding all Firefox tabs to Zotero using Chickenfoot, I showed how to write a Chickenfoot script to loop through all Firefox  tabs and add each of them as an item into Zotero.  A limitation of the script was that it used only ZoteroPane.addItemFromPage for every tab, even if a given tab had a translator that could be used to save the item. As explained in Adding items to Zotero with Chickenfoot, you can use Zotero_Browser.scrapeThisPage to invoke the appropriate translator for tab. The reason I didn't use Zotero_Browser.scrapeThisPage in my Chickenfoot script to add items is that I didn't know how to write a function to determine suitable translator exists.

Now, I think I've come up with a way of determining whether a translator exists — though I'm not highly confident that the solution is fullproof.    I'll share my  Chickenfoot script here, explain the logic behind it, and write about its possible limitations.  First the script:

// add_each_tab_to_Zotero_2.js
// R. Yee

var Zotero = chromeWindow.Zotero;
var ZoteroPane = chromeWindow.ZoteroPane;
var Zotero_Browser = chromeWindow.Zotero_Browser;
var tabBrowser = getTabBrowser(chromeWindow);

// getIcon returns a link to the translator icon for the current
// tab and false if there is no suitable
// translator and 'chrome://zotero/skin/treesource-collection.png'
// if there are multiple savable elements on a page

function getIcon() {

  var browser = Zotero_Browser.tabbrowser.selectedBrowser;
  var tab0 = new Zotero_Browser.Tab(browser);

  // need to figure out whether doc is HTMLDocument
  // doc instanceof HTMLDocument doesn't seem to work here

  var doc = browser.contentWindow.document;

  // in emulation of
  // https://www.zotero.org/trac/browser/extension/tags/1.0.7/chrome/content/zotero/browser.js#L311

  var rootDoc = doc;
  if (rootDoc.defaultView) {
    while(rootDoc.defaultView.frameElement) {
      rootDoc = rootDoc.defaultView.frameElement.ownerDocument;
    }
  }

  // detect possible translators and return the corresponding icon
  tab0.detectTranslators(rootDoc,doc);
  return tab0.getCaptureIcon();

} // getIcon()

// create a new collection with current date
var new_Collection = Zotero.Collections.add("_Saved " + (new Date()).toLocaleString());

// output # tabs
output(tabBrowser.browsers.length);

// loop through tabs, selecting each one in turn
for (var i=0; i < tabBrowser.browsers.length; i++) {
  tabBrowser.mTabContainer.advanceSelectedTab(1, true);
  output(tabBrowser.selectedBrowser.contentWindow.location);
  var icon = getIcon();
  // if icon is not false and not representing multiple items -- scrape page
  if (icon && icon != 'chrome://zotero/skin/treesource-collection.png') {
    Zotero_Browser.scrapeThisPage(new_Collection.id);
  // otherwise add item as a generic web page
  } else {
   ZoteroPane.addItemFromPage(new_Collection.id);
  }
}

A few points about the script:

When I ran the script with 3 tabs, it seemed to work fine. When I had 20+ tabs, all the tabs were saved — but only the first one ended up in the right collection. (I don't know why….) Also, the code is not terribly elegant — for example, it depends on creating a new Zotero_Browser.Tab for each tab; I figure there should be some way to read off whether an icon exists in the interface already without having to recalculate possible translators.

{ 3 } Comments

  1. Porreesteiner | January 16, 2009 at 3:25 pm | Permalink

    Dear Raymond, thanks again for your scripts. It works great for translatable pages and helps me a lot in daily work. As I am completely new to this field and cannot code it on my own so far, I'm interested in a work-around to get rid of the limitation that for non-translatable pages only the first tab gets saved into all collections. From my observations, I got the impression, that there is a first run, when all tabs get analysed and if they're not translatable, only the tab's name is saved in order to get its snapshot attached later. Then in the second run all collections are filled with a snapshot of only the first tab.
    Are there ways to get around this, such as writing a script, where snapshotted tabs get closed, so the next tab gets snapshotted afterwards? Or not to "add an item" from every page but to directly "snapshot" the page (by accepting all the related short-comings) to avoid the error-containing second run mentioned? Do you know the command for a direct snapshot – then I could experiment with it? Would it be necessary to integrate some tab changing routine after the "else" command again ? Sorry for firing you with questions, but I find this very exciting and I highly appreciate the effort you made so far.
    Best regards
    Porreesteiner

  2. Raymond Yee | January 23, 2009 at 12:41 pm | Permalink

    I've been working on figuring out why the script doesn't work at getting all translatable sights saved to the correct folder — but so far, I've not been able to pin down an example that I can make reliably fail! Here are my observations so far:

    1) The script works until at some point Zotero seems to enter into some "bad state". By "bad state" I mean one in which it no longer adds an item to the right folder when a translator is invoked. I've seen this happen after invoking the NY Times translator on

    http://www.nytimes.com/2009/01/20/us/politics/20text-obama.html?_r=1&em=&pagewanted=all

    Once Zotero is in this "bad state", I've gotten it out of that state by restarting Firefox.

    2) I've not been able to reliably create this bad state — it happens but I've not figured out what causes it.

    I'll keep my eyes on the problem and see whether I can figure it out. In the meantime, I'll also look for alternative Zotero commands and approaches.

  3. Porreesteiner | January 23, 2009 at 1:25 pm | Permalink

    Thank you for you reply. The behaviour that you describe is exactly what I observed. I don't know if Zotero has this problem because of some wrong chronological order in the single steps, as I mentioned above. Would it be possible to create some kind of time-out after archiving a single item to prevent "hiccups"?
    Do you know the snapshot command in zotero? Maybe there's a chance to automate at least this procedure. I could reproduce the "bad state" when trying to scrape items from http://www.discogs.com, a big online music release database made by user contributions.

{ 1 } Trackback

  1. […] Clearly I chose Zombies because it lets me make all kinds of terrible puns. You can see links to the 300 photos I looked at in this Zotero collection. You can also browse through them in this Simile Exhibit I created with MIT’s Citeline. For those interested, I used pulled all of these photos into a Zotero library using Raymond Yee's Chickenfoot script for running Zotero translators against all your open Firefox …. […]

Post a Comment

You must be logged in to post a comment.