Data Unbound : Helping organizations access and share data effectively. Special focus on web APIs for data integration.

Data Hosting vs Data Portability

A friend sent me a link to a recent post by Brad Templeton, Data hosting instead of data portability:

A data hosting approach has your personal data stored on a server chosen by you. (You might have that server right in your own house, or pay for hosting services.) If you pay, that serverâ€™s duty is not to exploit your data, but rather to protect it. Thatâ€™s what youâ€™re paying for. You can have more than one (with different personas, if you like) but for now letâ€™s imagine having just one.Your data hostâ€™s job is to perform actions on your data. Rather than giving copies of your data out to a thousand companies (the Facebook and Data Portability approach) you host the data and perform actions on it, programmed by those companies who are developing useful social applications.

I find data hosting appealing and would like to shift towards hosting my own data as opposed to having my data hosted elsewhere. It’s a matter of making it practical though.

For instance, I’m a big fan of Flickr because it makes it so easy to have my photos taken care of. But ideally, I’d like to host my own photos and directly control how people access them. I’d do that if I could build a good repository and layer services on top of them — just like Flickr. But Flickr has an economy of scale that I don’t have — it can solve that problem and provide the solution to many people.

Now, it’s possible that we can solve that problem too and sell and/orr share it to lots of people so that they can do more of their own data hosting. Is that a business that I would want to be in?

Tagged data hosting, data portability

2008 05 21 Raymond Yee repositories Comments (0) Permalink

Some musings on where I’d like to go next professionally

In January, a correspondent, having heard that I was about to publish a book on mashups, wrote me, saying that he would “love to find out more what [I’m] thinking”. Flattered to be asked, I replied. Here I quote an edited version of what I wrote. (I tend to like what I write in email because my writing tends to be energetically conversational.)

Let me tell you a bit of what I’m thinking and where I’m coming from. Obviously, I think that the topic of mashups is a big deal given my willingness to write a whole book about it. The element that excites me most is the power that individuals and small groups of people now have to recombine data and services — to use mashups to make sense of the world — particularly in the corner of the world in which I’m immersed (teaching, learning, and research in the context of higher education, libraries, and museums). When I first learned about XML and web services, I thought — wow — this is going to change the way we do research and way we teach and learn. I spoke about this topic at the O’Reilly ETCon in 2003.

I’ve built a research prototype (called the Scholar’s Box) to enable scholars to gather data from different sources, create personal collections, and share them with others. (I’m an advisor to a project called Zotero (http://www.zotero.org/) — which provides a Firefox plugin to enable people to manage bibliographic collections within the web browser — and ultimately to share their collections.)

I teach a course at the School of Information at UC Berkeley call “Mixing and Remixing Information“. This semester will be the third term I teach the course. It’s a project-based course, in which the focus is on helping students build their own mashups (see http://blog.mixingandremixing.info/s08/class-projects/ for some mashups from [this] year’s class) . A good number of my students have next-to-no experience with web programming. I have found that showing students the power of mashups — to get people excited about the possibilities — and then teach them how to make mashups is an excellent way into web programming. I’ve taken this approach with teenagers with some success last summer — I taught a six week course on the Berkeley campus.

In addition to master’s students this semester, I’ll be teaching a six week hands-on course to campus IT staff about building next-generation campus IT services — again by studying things like Flickr and Google maps and Yahoo! Pipes, getting them to build mashups, and thinking about how we can do things like that on campus — for administration and for research.

Now that I’m finished writing my book, I’m thinking about other opportunities. Perhaps it’s just the geek in me, but I really do think that some combination of Web 2.0 mashups, a bit more rigor from SOA, imagination, and some understanding of real problems can transform the worlds of education and research (and other worlds too — but education and research are something I know about.) I’m setting out to build a small company whose goal is to help the educational community effectively use Web 2.0 ideas (with a specific emphasis on remixability) to change the way we do things in that community. I will confess that my business plan still needs to be written, however…. In the meantime, I’m experimenting with a mix of teaching, consulting, and building software. (Some collaborators and I have a grant proposal in to enhance the teaching and learning of art history by integrating Flickr into the computational fabric of the classroom.) Most of all, I believe in the power of ideas — hence, I wrote a book to teach others.

Lots of questions remain however. (Now that my teaching jobs have come to an end, I now have some serious amounts of time to plot out my next steps. Writing is a great help to me in sorting out my thoughts, especially when I’m writing for a public audience. I would like to build a business but am unclear on exactly what it should look like. Undoubtedly, there will be details that would be unwise for me to share publicly– but I believe that a lot of my thinking would benefit from putting my ideas out there.

2008 05 19 Raymond Yee Uncategorized Comments (0) Permalink

What I’ve been up to

Here’s an update on my current professional activities that I hope will give you, my readers, a sense of where this blog will be heading:

My book Pro Web 2.0 Mashups: Remixing Data and Web Services was published by Apress on February 25, 2008.Â It’s gotten some good reviews, and I’ve heard from some happy readers. It’s time, however, for some more intense promotion of my book to make sure it fully reaches the audience it is meant to serve. (Most of my book-related activities will be discussed at my MashupGuide blog.)
In April, I finished teaching a six-week course (â€œBuilding Next-Generation Campus Information Servicesâ€ for IT staff on the Berkeley campus. â€œThe course designed to introduce campus professionals to the concepts of Web 2.0, XML, web services, and elements of web application development through the lens of mashups. While completing a six-week long project, participants will advance their knowledge and abilities, and gain insight into potential solutions to the information management needs they face on the job.â€ I plan to post more details about the course, including how it was structured, what projects came out of the class, and how I think this course can be improved.
Last week marked the culminating open house of the Mixing and Remixing Information course I teach at the School of Information at UC Berkeley. I had a blast teaching the course for the third time though I wonder whether it’s time for a total (or at least substantial ) revamp of the course.
I’ve started to contribute regularly to ProgrammableWeb, which I described in my book as â€œthe most useful web site for keeping up with the world of mashups, specifically, the relationships between all the APIs and mashups out there.”Â That was before I started writing for it!Â See the posts I’ve written for PW so far.
Finally, I’ve recently become the Integration Advisor for the Zotero Project, working on developing developer documentation for them, thinking about how to integrate Zotero with other things (in a sense, Zotero as a client-side mashup platform) — specifically in the context of Zotero-Internet Archive alliance.Â My work for Zotero will be a big part of what I’ll be discussing on this blog.

Tagged news

2008 05 18 Raymond Yee Uncategorized Comments (1) Permalink

notes from the Open Library developers’ meeting

I wasn’t able to make it to the Open Library Developers Meeting 2008 (Open Library) because I was in Los Angeles but I look forward to catching up on what happened that day.Â Â I’m excited to see how far the OpenLibrary project will get in terms of making data about books freely available to the world, not only in terms of a user interface but an API so that people can mashup the data.

Tagged openlibrary

2008 03 13 Raymond Yee Uncategorized Comments (0) Permalink

Mashupawards, Symfony and web frameworks

MashupAwards – best mashups on the web is a good list of mashups.

As I learn Django, a Python web programming framework, I’m starting to think about alternative frameworks, such as Ruby on Rails and Symfony (for PHP5). Is Symfony something to recommend to my students?

Tagged mashup symfony Django

2008 01 31 Raymond Yee Uncategorized Comments (0) Permalink

A nice example of how useful Amazon EC2 and S3 can be

In several weeks, I’ll be giving a talk to campus IT staff. I’ve long wanted to talk up the value of such services as Amazon EC2 and S3. Whenever I bring them up, I have tended to talk in the abstract of all the possibilities. I just came across a nice example in a blog that I just learned about: Self-service, Prorated Super Computing Fun! on open.blogs.nytimes.com, a blog about open source at the NY Times. The post describes how the author used EC2 and S3 to convert millions of files to PDF files:

I then began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3. (In fact, it work so well that we ran it twice, since after we were done we noticed an error in the PDFs.)

Wow, we as individuals have access to more and more computing power at lower prices all the time. I’ve long wanted to make use of the EC2 and S3 infrastructure.Â Â I don’t think that many people on campus know about EC2 and S3.Â Â Researchers who need a lot of computational power might build their own clusters or access the central campus services — or they may start using things like EC2 and S3.Â (That’s my argument). So far I’ve not had any need for S3 and EC2Â — but I’m pretty sure that this year will bring some projects my way that will give me an excuse to use EC2 and S3!

(BTW, I’m thrilled to learn about open.blogs.nytimes.com, which lets geeks who are also fans of the TimesÂ get a glimpse into the IT technology behind an important online paper.)

Tagged NYTimes AmazonEC2 AmazonS3

2008 01 25 Raymond Yee journalism Comments (0) Permalink

Interesting application of scholarly data mining

When I saw Copycat Articles Seem Rife in Science Journals, a Digital Sleuth Finds – Chronicle.com, I was curious about the technology behind the findings. How did the researchers figure out the level of duplication in the medical literature? One aspect was the use of eTBlast. (To learn more, I can follow up by reading the Nature News article (How many papers are just duplicates?) that in turn points to the full article (A tale of two citations : Article : Nature).

2008 01 25 Raymond Yee data mining
open access Comments (0) Permalink

Notelets for 2008.01.17

I look forward to the starting up of the Buckland/Larson/Lynch seminar next week.

I’m pleased to see the word “mash-up” used in an article about a Berkeley website: 01.16.2008 – New life for the New Deal:

“I realized I couldnâ€™t do it myself,” Brechin says. “It had to be people all over California working collaboratively,” in an echo of the New Deal itself. He turned to the campusâ€™s Institute for Research on Labor and Employment and the California Studies Center, which teamed up to take over the projectâ€™s website. Designed and managed by volunteers, the site had been built around a “mash-up” – a database-driven system that could display research on New Deal sites on a dynamically created map â€” created by Jay McCauley, a retired Silicon Valley software-engineering director.

Time to check out the mashup in question: Living New Deal Project

I’m excited that Aaron Schwarz has set up (theinfo):

This is a site for large data sets and the people who love them: the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them. It’s a place where they can exchange tips and tricks, develop and share tools together, and begin to integrate their particular projects.

I’m going to work in materials from Flickr: The Commons when I come back to building the ScholarsBox. Such good news — having photos from the Library of Congress hosted at Flickr makes them much more reusable than when the photos sat at LC alone. Steve

Check out smARThistory.org — the multimedia art history book: Europe:

This web-booksite is being developed by Beth Harris and Steven Zucker as a dynamic enhancement (or even substitute) for the static traditional art history textbook. By using the strengths of podcasting, video, and other web 2.o technologies, we think we can better meet the needs of students, faculty, and the interested public. Once this site is better established, we intend to invite the user community to add and edit content.

2008 01 18 Raymond Yee notelets
Uncategorized Comments (0) Permalink

Gigapan beta program

Commercial Gigapan is a fancy add-on to a digital camera that allows take a series of photos that can be stitched into a high-resolution panorama. I applied to be part of the beta program, bu haven’t heard yet whether I’ve been accepted into it. Will I have the privilege of paying $279 for a prototype?

Update: I just got an email stating that the program had received an overwhelming number of applications and that everyone who had applied will hear of his status in the next few weeks.

2007 10 29 Raymond Yee hardware
imaging
prototype Comments (0) Permalink

Data Unbound

Data Hosting vs Data Portability

Some musings on where I’d like to go next professionally

What I’ve been up to

notes from the Open Library developers’ meeting

Mashupawards, Symfony and web frameworks

A nice example of how useful Amazon EC2 and S3 can be

More technical books on my reading list

Interesting application of scholarly data mining

Notelets for 2008.01.17

Gigapan beta program

Home

Pages

Categories

Blogroll

RSS Feeds

Meta

Blog Search