In several weeks, I'll be giving a talk to campus IT staff. I've long wanted to talk up the value of such services as Amazon EC2 and S3. Whenever I bring them up, I have tended to talk in the abstract of all the possibilities. I just came across a nice example in a blog that I just learned about: Self-service, Prorated Super Computing Fun! on open.blogs.nytimes.com, a blog about open source at the NY Times. The post describes how the author used EC2 and S3 to convert millions of files to PDF files:
I then began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3. (In fact, it work so well that we ran it twice, since after we were done we noticed an error in the PDFs.)
Wow, we as individuals have access to more and more computing power at lower prices all the time. I've long wanted to make use of the EC2 and S3 infrastructure.  I don't think that many people on campus know about EC2 and S3.  Researchers who need a lot of computational power might build their own clusters or access the central campus services — or they may start using things like EC2 and S3. (That's my argument). So far I've not had any need for S3 and EC2 — but I'm pretty sure that this year will bring some projects my way that will give me an excuse to use EC2 and S3!
(BTW, I'm thrilled to learn about open.blogs.nytimes.com, which lets geeks who are also fans of the Times get a glimpse into the IT technology behind an important online paper.)
Post a Comment