01.31.08

Mashupawards, Symfony and web frameworks

Posted in Uncategorized at 7:09 pm by yee

MashupAwards - best mashups on the web is a good list of mashups.

As I learn Django, a Python web programming framework, I'm starting to think about alternative frameworks, such as Ruby on Rails and Symfony (for PHP5). Is Symfony something to recommend to my students?

01.25.08

A nice example of how useful Amazon EC2 and S3 can be

Posted in journalism at 6:04 pm by yee

In several weeks, I'll be giving a talk to campus IT staff. I've long wanted to talk up the value of such services as Amazon EC2 and S3. Whenever I bring them up, I have tended to talk in the abstract of all the possibilities. I just came across a nice example in a blog that I just learned about: Self-service, Prorated Super Computing Fun! on open.blogs.nytimes.com, a blog about open source at the NY Times. The post describes how the author used EC2 and S3 to convert millions of files to PDF files:

I then began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3. (In fact, it work so well that we ran it twice, since after we were done we noticed an error in the PDFs.)

Wow, we as individuals have access to more and more computing power at lower prices all the time. I've long wanted to make use of the EC2 and S3 infrastructure.   I don't think that many people on campus know about EC2 and S3.   Researchers who need a lot of computational power might build their own clusters or access the central campus services — or they may start using things like EC2 and S3.  (That's my argument). So far I've not had any need for S3 and EC2  — but I'm pretty sure that this year will bring some projects my way that will give me an excuse to use EC2 and S3!

(BTW, I'm thrilled to learn about open.blogs.nytimes.com, which lets geeks who are also fans of the Times  get a glimpse into the IT technology behind an important online paper.)

More technical books on my reading list

Posted in Uncategorized at 5:45 pm by yee

It'll be fun to work through Visualizing Data — after I get through reading Programming Collective Intelligence . But instead of just reading books, I need to have some specific problems in mind — which I do. More soon on what those problems are.

Interesting application of scholarly data mining

Posted in data mining, open access at 5:39 pm by yee

When I saw Copycat Articles Seem Rife in Science Journals, a Digital Sleuth Finds - Chronicle.com, I was curious about the technology behind the findings. How did the researchers figure out the level of duplication in the medical literature? One aspect was the use of eTBlast. (To learn more, I can follow up by reading the Nature News article (How many papers are just duplicates?) that in turn points to the full article (A tale of two citations : Article : Nature).

01.18.08

Notelets for 2008.01.17

Posted in Uncategorized, notelets at 9:50 am by yee

I look forward to the starting up of the Buckland/Larson/Lynch seminar next week.

I'm pleased to see the word "mash-up" used in an article about a Berkeley website: 01.16.2008 - New life for the New Deal:

    "I realized I couldn’t do it myself," Brechin says. "It had to be people all over California working collaboratively," in an echo of the New Deal itself. He turned to the campus’s Institute for Research on Labor and Employment and the California Studies Center, which teamed up to take over the project’s website. Designed and managed by volunteers, the site had been built around a "mash-up" - a database-driven system that could display research on New Deal sites on a dynamically created map — created by Jay McCauley, a retired Silicon Valley software-engineering director.

Time to check out the mashup in question: Living New Deal Project

I'm excited that Aaron Schwarz has set up (theinfo):

    This is a site for large data sets and the people who love them: the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them. It's a place where they can exchange tips and tricks, develop and share tools together, and begin to integrate their particular projects.

I'm going to work in materials from Flickr: The Commons when I come back to building the ScholarsBox. Such good news — having photos from the Library of Congress hosted at Flickr makes them much more reusable than when the photos sat at LC alone. Steve

Check out smARThistory.org — the multimedia art history book: Europe:

    This web-booksite is being developed by Beth Harris and Steven Zucker as a dynamic enhancement (or even substitute) for the static traditional art history textbook. By using the strengths of podcasting, video, and other web 2.o technologies, we think we can better meet the needs of students, faculty, and the interested public. Once this site is better established, we intend to invite the user community to add and edit content.