March 13th, 2008
I wasn’t able to make it to the Open Library Developers Meeting 2008 (Open Library) because I was in Los Angeles but I look forward to catching up on what happened that day. I’m excited to see how far the OpenLibrary project will get in terms of making data about books freely available to the world, not only in terms of a user interface but an API so that people can mashup the data.
Posted in Uncategorized | No Comments »
January 31st, 2008
MashupAwards - best mashups on the web is a good list of mashups.
As I learn Django, a Python web programming framework, I’m starting to think about alternative frameworks, such as Ruby on Rails and Symfony (for PHP5). Is Symfony something to recommend to my students?
Posted in Uncategorized | No Comments »
January 25th, 2008
In several weeks, I’ll be giving a talk to campus IT staff. I’ve long wanted to talk up the value of such services as Amazon EC2 and S3. Whenever I bring them up, I have tended to talk in the abstract of all the possibilities. I just came across a nice example in a blog that I just learned about: Self-service, Prorated Super Computing Fun! on open.blogs.nytimes.com, a blog about open source at the NY Times. The post describes how the author used EC2 and S3 to convert millions of files to PDF files:
I then began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3. (In fact, it work so well that we ran it twice, since after we were done we noticed an error in the PDFs.)
Wow, we as individuals have access to more and more computing power at lower prices all the time. I’ve long wanted to make use of the EC2 and S3 infrastructure. I don’t think that many people on campus know about EC2 and S3. Researchers who need a lot of computational power might build their own clusters or access the central campus services — or they may start using things like EC2 and S3. (That’s my argument). So far I’ve not had any need for S3 and EC2 — but I’m pretty sure that this year will bring some projects my way that will give me an excuse to use EC2 and S3!
(BTW, I’m thrilled to learn about open.blogs.nytimes.com, which lets geeks who are also fans of the Times get a glimpse into the IT technology behind an important online paper.)
Posted in journalism | No Comments »
January 25th, 2008
It’ll be fun to work through Visualizing Data — after I get through reading Programming Collective Intelligence . But instead of just reading books, I need to have some specific problems in mind — which I do. More soon on what those problems are.
Posted in Uncategorized | No Comments »
January 25th, 2008
When I saw Copycat Articles Seem Rife in Science Journals, a Digital Sleuth Finds - Chronicle.com, I was curious about the technology behind the findings. How did the researchers figure out the level of duplication in the medical literature? One aspect was the use of eTBlast. (To learn more, I can follow up by reading the Nature News article (How many papers are just duplicates?) that in turn points to the full article (A tale of two citations : Article : Nature).
Posted in data mining, open access | No Comments »
January 18th, 2008
I look forward to the starting up of the Buckland/Larson/Lynch seminar next week.
I’m pleased to see the word “mash-up” used in an article about a Berkeley website: 01.16.2008 - New life for the New Deal:
“I realized I couldn’t do it myself,” Brechin says. “It had to be people all over California working collaboratively,” in an echo of the New Deal itself. He turned to the campus’s Institute for Research on Labor and Employment and the California Studies Center, which teamed up to take over the project’s website. Designed and managed by volunteers, the site had been built around a “mash-up” - a database-driven system that could display research on New Deal sites on a dynamically created map — created by Jay McCauley, a retired Silicon Valley software-engineering director.
Time to check out the mashup in question: Living New Deal Project
I’m excited that Aaron Schwarz has set up (theinfo):
This is a site for large data sets and the people who love them: the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them. It’s a place where they can exchange tips and tricks, develop and share tools together, and begin to integrate their particular projects.
I’m going to work in materials from Flickr: The Commons when I come back to building the ScholarsBox. Such good news — having photos from the Library of Congress hosted at Flickr makes them much more reusable than when the photos sat at LC alone. Steve
Check out smARThistory.org — the multimedia art history book: Europe:
This web-booksite is being developed by Beth Harris and Steven Zucker as a dynamic enhancement (or even substitute) for the static traditional art history textbook. By using the strengths of podcasting, video, and other web 2.o technologies, we think we can better meet the needs of students, faculty, and the interested public. Once this site is better established, we intend to invite the user community to add and edit content.
Posted in Uncategorized, notelets | No Comments »
October 29th, 2007
Commercial Gigapan is a fancy add-on to a digital camera that allows take a series of photos that can be stitched into a high-resolution panorama. I applied to be part of the beta program, bu haven’t heard yet whether I’ve been accepted into it. Will I have the privilege of paying $279 for a prototype?
Update: I just got an email stating that the program had received an overwhelming number of applications and that everyone who had applied will hear of his status in the next few weeks.
Posted in hardware, imaging, prototype | No Comments »
September 25th, 2007
Picnik, the web-based photo editor, encourages me to experiment with my Flickr photos by its tight integration with Flickr. I can view my Flickr photos, edit any given one, and then send it back to Flickr — all within Picnik. (Compare this photo to the original.)
Posted in Uncategorized | No Comments »
September 24th, 2007
I’m giving a talk on Wednesday at the School of Information Sciences, University of Pittsburgh: SWeb Mashups, Recombinatory Data and the Academy:
Yee will examine how, with relatively little effort, individuals are recombining digital content from the Web to create sophisticated mashups. The mashups often provide entirely new understandings of that content. This talk will survey the world of mashups, how they are created, how people learn to make them — and specifically, the implications of recombinatory data and services for the university.
There’s a growing body of academic research around tagging. I’ll think more deeply about this research when I sit down to design software that makes use of tagging for discovery, etc.) For example, The Social Structure of Tagging Internet Video on del.icio.us:
Since the system:media:video tag is automatically attached to bookmarks, we are able to access a stream of content whose characteristics are relatively independent from the users’ tagging behavior. Otherwise it is very difficult to obtain a data sample that is not biased in some way toward particular users, tags or content. Consequently, we our focus is not on the behaviors of specific users. However, since we are interested describing large-scale effects we will not worry about this issue here.
When I get seriously into studying machine learning, I’ll consult the following resources:
Posted in Uncategorized | No Comments »
June 20th, 2007
As I write my book, I find the article ONLamp.com: Why Do People Write Free Documentation? Results of a Survey quite interesting. The book isn’t exactly “free documentation” although I’m putting my book online for free downloading.
Besides reading a book, I find it helpful to hear the author talk about his or her book. Hence I recommend IT Conversations: Leonard Richardson, Sam Ruby to those reading Richardson and Ruby’s Restful Web Services.
Details on using easy_install on Python: Python Cheese Shop : Browse is a list of the high level categories in the repository of Python packages. See also Python Cheese Shop : Home: “The Python Cheese Shop is a repository of software for the Python programming language. There are currently 2455 packages here.”
Posted in notelets | No Comments »