humanities

Cool to see a digital historian explain screen-scraping

I'm adding Digital History Hacks to my list of weblogs to follow on the strength the author (William J. Turkel) 's being a historian working in "digital history" and writing about web spidering and scraping. To wit, Digital History Hacks: Teaching Young Historians to Search, Spider and Scrape:

    To get the most out of the web, however, it is crucial that we begin to teach history students the rudiments of web programming. Spidering, for example, is the (automated) process of visiting a webpage, creating an index and a list of links to further pages, and then following each of those in turn and doing the same thing. Whenever we follow the citations in a footnote to another source, and then begin to read its footnotes, we are doing a kind of spidering. By teaching students how to implement this process on the computer we will not only teach them a crucial skill, we will make them more aware of the technologies that have long underlain the historian's craft. Scraping refers to the process of mechanically extracting information from sources (like webpages) that are intended to be read by people rather than machines. Because computers don't understand text in the way that people do, scraping has to rely on the form of the text to extract information, rather than the meaning. As a result, scrapers are 'brittle': if the form changes, the scraper breaks. For this reason, it is important for historians to be able to create their own tools, rather than using the tools created by others, and this, again, means that it is necessary to learn some rudimentary web programming.

digital scholarship
higher education
humanities
screen scraping

Comments (0)

Permalink

Volker Wulf on communities of practice

Friday's talk by Volker Wulf at the "Friday afternoon seminar on designing for specific communities of practices prompted me to look up a number of concepts and people, including:

humanities
iSchool

Comments (0)

Permalink

Web 2.0 in instruction; a book on digital humanities; UIUC folks

Two words from the second half of Spotlight on Web 2.0 12-8-06 1-5-07 FridayLive! TLT Group Online Institute resonated with me:

  • self-service
  • disaggregation

In the session, I also learned about the course ETEC 527: Technologies for Instructional Delivery.

To dig deeper into digital humanities, I will read A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004. I will note that there are so many great faculty at great faculty at the UIUC Graduate School of Library and Information Science studying scholarly work.

Uncategorized
humanities
web20

Comments (0)

Permalink