<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Unbound &#187; screen scraping</title>
	<atom:link href="http://blog.dataunbound.com/category/screen-scraping/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.dataunbound.com</link>
	<description>Helping organizations access and share data effectively.  Special focus on web APIs for data integration.</description>
	<lastBuildDate>Sat, 12 Feb 2011 21:00:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
  <link>http://blog.dataunbound.com</link>
  <url>http://blog.dataunbound.com/wp-content/plugins/favicon-manager/dataunbound.ico</url>
  <title>Data Unbound</title>
</image>
		<item>
		<title>Cool to see a digital historian explain screen-scraping</title>
		<link>http://blog.dataunbound.com/2007/05/23/cool-to-see-a-digital-historian-explain-screen-scraping/</link>
		<comments>http://blog.dataunbound.com/2007/05/23/cool-to-see-a-digital-historian-explain-screen-scraping/#comments</comments>
		<pubDate>Wed, 23 May 2007 20:51:20 +0000</pubDate>
		<dc:creator>Raymond Yee</dc:creator>
				<category><![CDATA[digital scholarship]]></category>
		<category><![CDATA[higher education]]></category>
		<category><![CDATA[humanities]]></category>
		<category><![CDATA[screen scraping]]></category>

		<guid isPermaLink="false">http://blog.dataunbound.com/2007/05/23/cool-to-see-a-digital-historian-explain-screen-scraping/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Cool+to+see+a+digital+historian+explain+screen-scraping&amp;rft.aulast=&amp;rft.aufirst=&amp;rft.subject=digital+scholarship&amp;rft.subject=higher+education&amp;rft.subject=humanities&amp;rft.subject=screen+scraping&amp;rft.source=Data+Unbound&amp;rft.date=2007-05-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://blog.dataunbound.com/2007/05/23/cool-to-see-a-digital-historian-explain-screen-scraping/&amp;rft.language=English"></span>
I&#039;m adding Digital History Hacks to my list of weblogs to follow on the strength the author (William J. Turkel) &#039;s being a historian working in &#034;digital history&#034; and writing about web spidering and scraping. To wit, Digital History Hacks: Teaching Young Historians to Search, Spider and Scrape: To get the most out of the [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Cool+to+see+a+digital+historian+explain+screen-scraping&amp;rft.aulast=&amp;rft.aufirst=&amp;rft.subject=digital+scholarship&amp;rft.subject=higher+education&amp;rft.subject=humanities&amp;rft.subject=screen+scraping&amp;rft.source=Data+Unbound&amp;rft.date=2007-05-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://blog.dataunbound.com/2007/05/23/cool-to-see-a-digital-historian-explain-screen-scraping/&amp;rft.language=English"></span>
<p>  I&#039;m adding <a href="http://digitalhistoryhacks.blogspot.com/" class="external">Digital History Hacks</a> to my list of weblogs to follow on the strength the author (<a href="http://history.uwo.ca/faculty/turkel/" class="external">William J. Turkel</a>) &#039;s being a historian working in &#034;digital history&#034; and writing about web spidering and scraping.  To wit, <a href="http://digitalhistoryhacks.blogspot.com/2005/12/teaching-young-historians-to-search.html" class="external">Digital History Hacks: Teaching Young Historians to Search, Spider and Scrape</a>:</p>
<ul> To get the most out of the web, however, it is  crucial that we begin to teach history students the rudiments of web  programming. Spidering, for example, is the (automated) process of  visiting a webpage, creating an index and a list of links to further  pages, and then following each of those in turn and doing the same  thing. Whenever we follow the citations in a footnote to another  source, and then begin to read its footnotes, we are doing a kind of  spidering. By teaching students how to implement this process on the  computer we will not only teach them a crucial skill, we will make them  more aware of the technologies that have long underlain the historian&#039;s  craft.  Scraping refers to the process of mechanically extracting information  from sources (like webpages) that are intended to be read by people  rather than machines. Because computers don&#039;t understand text in the  way that people do, scraping has to rely on the form of the text to  extract information, rather than the meaning. As a result, scrapers are  &#039;brittle&#039;: if the form changes, the scraper breaks. For this reason, it  is important for historians to be able to create their own tools,  rather than using the tools created by others, and this, again, means  that it is necessary to learn some rudimentary web programming.</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.dataunbound.com/2007/05/23/cool-to-see-a-digital-historian-explain-screen-scraping/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

