This is a big, important topic. I've been thinking about it, and working on plans to deal with it, for a while now. Please, tell me what you thnik.

The web has evolved a lot, both technically and otherwise. It has evolved in how it solves problems and what problems it solves to aim. It has evolved both what it is and why it is.

I'll be writing for a while on the subject, in chunks. This is kind of an experiment in my ongoing hunt for a better way to write and get my ideas out. Tell me if you think this is a good idea, or complain if it just annoys you.

The Problem With Web Data


What does it mean to "own" my data in a web-based world? I don't know what this means, but we should.

Before the growing responsibilities of "The Web" is for us, owning my data meant the files sat on my own machine. It meant the files were in a format that was either human readable, or documented well enough that alternative tools could read it or convert it. It meant no vendor could die, or change its mind, and make my data useless to me.

This isn't the case in the web today.

On the web, my data sits in a database on a machine someone else owns. They make backups, I hope. The information, even if I am said to own it, sits in a format I do not know and cannot see. Even if I got a copy, I wouldn't know what to do with it. The cloud is the Microsoft Word of our day.

The location of the data is shifting, in a potential future where IndexedDB and friends are the norm and our web applications are just deployment vectors, but we still work with data sitting on our own machines. But, after long battles for open formats and documented data, we suddenly are being thrust into a world where all of our own data, on our own machines, is obscured in thousands of tiny web apps with thier own "format" of dumping our data into the browsers storage sandbox. The data is on my machine, but is the data mine?

What good was the first for open file formats, when even our local data isn't being stored in a "file" the way we've known it up to now?

We've been fighting for a while to keep access open to our own data on servers, but this time we need to solve the problem before it comes up. We need to make sure we own the data inside our own browsers before it gets locked away from us in the first place.

These are the questions we need to answer:

  • What should we demand of services that hold our data?
  • How can we position this to benefit the service providers, incentivizing them to do te right thing?
  • Can browser-side storage replace the filesystem for a user?
  • Do we even want this?
  • How can web standards, browser vendors, and users push for te right future to happen?

We need to answer these problems early, this time around.

read more or comment

Owning Your Cloud Data


The most well known, yet still problematic, area of data ownership in our web-based world is all the data we keep housed "in the cloud" in the machines of the services we use every day.

  • The contents of your blog at Tumblr
  • The documents your company shares on Google Drive
  • Your to-do list at Nozbe or Remember the Milk
  • The music you've bought and listen to at Amazon
  • All your family photos kept safely (you hope) on Flickr

All of these services hold the information that is important to you, that you depend on, in many cases defines a large part of who you are (your writings, your photos, your musical tastes). How often do you plan for these services having a major outage? Or vanishing entirely and forever? Or suffering some kind of terrible data loss?

What if you just decide they aren't the right fit, and you want to take your business (and your data) elsewhere? Do you have that option? Is it a painful option, if it even exists?

Most of us, even the technical among us, don't think about this at all.

Many have been pushing for this, both from the outside and inside of the organizations we depend on, and we have them to thank for the cloud data we can exercise some moderate feeling of ownership over. The victories are as varied as they are rare, unfortunately.

  • We still depend on DRM servers to verify we're allowed to watch movies we own
  • Our GMail accounts can be fetched locally, but only as an implementation of IMAP, not an explicit export option
  • My to-do list can be synced by third party tools, but they're banned from the service if they provide any new features on that data. Do I really own my to-do list?

We have a lot of work to do.

Obviously, it can be seen as a business problem not to keep some grip on this data. I won't say I don't blame vendors (I do blame them) but I understand their position, just the same. If we want real change, we need to make the case for private data access on business merits and give them a reason to support our blight that even their accountants can get behind.

How can a business benefit from letting its customers walk away?

  • Access to your data is an excellent selling point. I'll pay a premium to know my data is my data
  • It may make it easier to leave, but it also makes it easier to try your service out, knowing that I can back out if I don't like it. Let me test the waters!
  • The same access that lets me walk away also lets me do more with your service. Dropbox is a great example of this.

A friend recent told me that "owning your data and web apps are mutually exclusive" and I'm confident we can prove him wrong.

read more or comment