Scoble gets booted off Facebook for scraping his personal data

facebook
Photo courtesy of Laughing Squid

Just today, blogerati Robert Scoble got booted off Facebook. He was was privately testing Plaxo’s Data Importer on Facebook, as new feature of Plaxo Pulse (to sync contacts from web services). Since Facebook doesn’t provide APIs to allow developers to grab their own data off the network (e.g. your contacts), that’s where the dirtier means of scraping comes in.

If you are trying to contact me on Facebook, please don’t. My account has been “disabled” for breaking Facebook’s Terms of Use. I was running a script that got them to keep me from accessing my account. I’m appealing. I’ll tell you what I was doing as soon as I talk with the developers who built what I was using and as soon as I talk with Facebook’s support (I sent an email in reply to the one below, but haven’t heard back yet).

Web scraping generically describes any of various means to extract content from a website over HTTP for the purpose of transforming that content into another format suitable for use in another context. You could do it by hand (like copy and pasting text), but it’ll hurt if there are lots of pages. In most cases, you could use a script or a program to do it for you, but that’s quite detectable by most systems since it would result in high traffic coming from an IP address.

It’s rather coincidental since Wired’s latest issue has a great feature entitled “Should Web Giants Let Startups Use the Information They Have About You?“. Rather timely to note how this might be an issue as earlier traditional web service business models collide with Web 2.0 philosophy.

Unfortunately, Facebook isn’t exactly being open with their business. They’re pretty much an old-school business run by persistent and lucky teenagers (remember Facebook’s fumble with Beacon and NewsFeed?).

Desperately hoarding onto your own personal data and telling you how you are allowed to use it is so communist, it’s not even funny (see my “Walled Garden” story).

Perhaps this is why Scott Gilbertson of Wired spoke so eloquently on the issue:

[I]n the Facebook world, your data is welcome to come in, but there’s no going out. It turns out Facebook is a bit like the Hotel California: “we are programmed to receive / You can check-out any time you like / But you can never leave!”

Just a word of caution NOT to produce solely on Facebook, but publish widely beyond its walls.

Question: So why are we still hooked on Facebook?

My take: It focuses extremely well on friends and their activities. Unlike typical forums and social networks, Facebook does have a neat balance of personality (internal self) and shared content (external self). It’s a fully identified Internet within an Internet where the norm seems to be of sharing accurate information about oneself, thereby allowing others to find you. For instance, just look at the number of people who disclose actual birthdays… I don’t think that was ever a norm in the past! In turn, Facebook results in having precise demographic data about everyone, perfect for advertisers (try putting up a Facebook ad to see the array of targeting options!). As much as we might hate the Facebook government, the traction we’ve built for ourselves with it (i.e. no. of friends) makes it harder to leave.

Summary: For all it’s insidious worth, Facebook pretty much got this game locked down tight. Heck, the user population growth won’t likely plateau out since there’s always something new; Facebook largely depends on user-generated content (e.g. your posts, photos, videos, apps). Unless Facebook pisses more users off, I don’t think anyone would be the wiser to leave. I guess they’d only open up if it’s their last resort.

Call to Action: If you’re like me, feeling up in arms, perhaps you could consider joining the Data Portability initiative at dataportability.org. They’re about making all forms of personal data discoverable, and shared between our chosen tools or vendors. Yes, we need a DHCP for Identity. ;)

Aside: I’m a huge fan of Plaxo’s contact syncing service, since it helps me keep contacts in my Mac’s address book (and thus iPhone), Gmail, LinkedIn, Yahoo, and many more all in sync. If it detects that any of my contacts is also a Plaxo user (matched via email address), anytime that user makes a change to his contact info, I get the data synced up automatically. It’s clever, and it’s what being Web 2.0 is so all about… transparency between web services. This is the kind of open network I wish to have.

  • nuMentally

    Facebook is like Hotel California? Holycrap! Facebook is evil. That been said, please add me as your friend. I need more friends.

  • http://tubagbohol.mikeligalig.com Bol-anon

    Facebook still owns Scoble’s data. Thou shall not steal, I can hear Facebook saying.

  • http://shadyproject.net shady

    I’m not so sure that the facebook API doesn’t let you access that particular information. After (briefly) perusing the facebook API documentation, it looks like there are methods available both to get the users profile and their friends:

    facebook.users.getInfo, facebook.profile.getFBML, and facebook.friends.get would all do the job. That doesn’t even include using a custom FQL query to get the job done.

    The practical limitations seems to be that you can only get this information inside a facebook application that is installed per user. So in order to truly mine all the information in a sort of connected graph (without requiring a user to actually install an app) you would need to resort to web scraping as you said. I’m not sure why they didn’t include some kind of random wait time between requests. They can’t be in that much of a hurry that waiting a random amount of time between 30 seconds and two minutes per request per account would be detrimental. Throw multiple accounts and a few web proxies into the mix, and avoiding the ‘hey jerky stop scraping our site” warnings doesn’t seem like it would be too hard.

    I doubt that there would be much of a problem with scoble scraing his own data, and maybe his friends list. The proble (and potential privacy issues) come when he starts scrainping his friends list, and their data, and his friends friends and so on. His data may belong to him (and facebook), but his friends and their friends data belongs to them (and facebook).

    The best solution in my mind would be to write a facebook app that grabs all the information in your profile, assigns you a unique (anonymized? given the nature of facebook that seems to not really be an issue) identifier based on your name, birth date and maybe some other uniqe info, assigns the same uniue i dentifer to all your friends, and then exports the data into a common open xml format.

    If the method used to assign the unique id to yourself and your friends is deterministic (i.e. f two different people have the same friend, that friend will have the same unique id in both sets of exported data) you would end up with all the social network connections that facebook has, without actually needing to scrape facebook to get it.

    The problem would be to get everyone to install the application. Of course, the kind of people who support OpenSocial and the other open data standards would probably install the application willingly. But, given the terms of service everyone agrees to when they install any face book app, this kind of functionality could easily be wrapped up in a trojan horse fashion (e.g. check out this cool new game!) which would allow you to potentially get the information for a much wider range of facebook users.

    Which leads to the question: if an application does that, would it meet the definition of spyware? And if so, would we have to call it spyware 2.0?