
Photo courtesy of Laughing Squid
Just today, blogerati Robert Scoble got booted off Facebook. He was was privately testing Plaxo’s Data Importer on Facebook, as new feature of Plaxo Pulse (to sync contacts from web services). Since Facebook doesn’t provide APIs to allow developers to grab their own data off the network (e.g. your contacts), that’s where the dirtier means of scraping comes in.
If you are trying to contact me on Facebook, please don’t. My account has been “disabled” for breaking Facebook’s Terms of Use. I was running a script that got them to keep me from accessing my account. I’m appealing. I’ll tell you what I was doing as soon as I talk with the developers who built what I was using and as soon as I talk with Facebook’s support (I sent an email in reply to the one below, but haven’t heard back yet).
Web scraping generically describes any of various means to extract content from a website over HTTP for the purpose of transforming that content into another format suitable for use in another context. You could do it by hand (like copy and pasting text), but it’ll hurt if there are lots of pages. In most cases, you could use a script or a program to do it for you, but that’s quite detectable by most systems since it would result in high traffic coming from an IP address.
It’s rather coincidental since Wired’s latest issue has a great feature entitled “Should Web Giants Let Startups Use the Information They Have About You?“. Rather timely to note how this might be an issue as earlier traditional web service business models collide with Web 2.0 philosophy.
Unfortunately, Facebook isn’t exactly being open with their business. They’re pretty much an old-school business run by persistent and lucky teenagers (remember Facebook’s fumble with Beacon and NewsFeed?).
Desperately hoarding onto your own personal data and telling you how you are allowed to use it is so communist, it’s not even funny (see my “Walled Garden” story).
Perhaps this is why Scott Gilbertson of Wired spoke so eloquently on the issue:
[I]n the Facebook world, your data is welcome to come in, but there’s no going out. It turns out Facebook is a bit like the Hotel California: “we are programmed to receive / You can check-out any time you like / But you can never leave!”
Just a word of caution NOT to produce solely on Facebook, but publish widely beyond its walls.
Question: So why are we still hooked on Facebook?
My take: It focuses extremely well on friends and their activities. Unlike typical forums and social networks, Facebook does have a neat balance of personality (internal self) and shared content (external self). It’s a fully identified Internet within an Internet where the norm seems to be of sharing accurate information about oneself, thereby allowing others to find you. For instance, just look at the number of people who disclose actual birthdays… I don’t think that was ever a norm in the past! In turn, Facebook results in having precise demographic data about everyone, perfect for advertisers (try putting up a Facebook ad to see the array of targeting options!). As much as we might hate the Facebook government, the traction we’ve built for ourselves with it (i.e. no. of friends) makes it harder to leave.
Summary: For all it’s insidious worth, Facebook pretty much got this game locked down tight. Heck, the user population growth won’t likely plateau out since there’s always something new; Facebook largely depends on user-generated content (e.g. your posts, photos, videos, apps). Unless Facebook pisses more users off, I don’t think anyone would be the wiser to leave. I guess they’d only open up if it’s their last resort.
Call to Action: If you’re like me, feeling up in arms, perhaps you could consider joining the Data Portability initiative at dataportability.org. They’re about making all forms of personal data discoverable, and shared between our chosen tools or vendors. Yes, we need a DHCP for Identity. ;)
Aside: I’m a huge fan of Plaxo’s contact syncing service, since it helps me keep contacts in my Mac’s address book (and thus iPhone), Gmail, LinkedIn, Yahoo, and many more all in sync. If it detects that any of my contacts is also a Plaxo user (matched via email address), anytime that user makes a change to his contact info, I get the data synced up automatically. It’s clever, and it’s what being Web 2.0 is so all about… transparency between web services. This is the kind of open network I wish to have.


