Data Science of the Facebook World

The ever insightful Stephen Wolfram has another graph-heavy post, this time compiling data on Facebook analytics:

More than a million people have now used our Wolfram|Alpha Personal Analytics for Facebook. And as part of our latest update, in addition to collecting some anonymized statistics, we launched a Data Donor program that allows people to contribute detailed data to us for research purposes.

A few weeks ago we decided to start analyzing all this data. And I have to say that if nothing else it’s been a terrific example of the power of Mathematica and the Wolfram Language for doing data science. (It’ll also be good fodder for the Data Science course I’m starting to create.)

We’d always planned to use the data we collect to enhance our Personal Analyticssystem. But I couldn’t resist also trying to do some basic science with it.

I’ve always been interested in people and the trajectories of their lives. But I’ve never been able to combine that with my interest in science. Until now. And it’s been quite a thrill over the past few weeks to see the results we’ve been able to get. Sometimes confirming impressions I’ve had; sometimes showing things I never would have guessed. And all along reminding me of phenomena I’ve studied scientifically in A New Kind of Science.

So what does the data look like? Here are the social networks of a few Data Donors—with clusters of friends given different colors. (Anyone can find their own network usingWolfram|Alpha—or the SocialMediaData function in Mathematica.)

It’s a pretty fascinating read.

My favorite graph was this one of the distribution of  your Facebook friends’ age versus your age:

The age of your Facebook friends versus your age.

The age of your Facebook friends versus your age.

It’s also quite interesting how the marriage statistics from Facebook line up with the official Census data:

Facebook marriage age vs. Census data.

Facebook marriage age vs. Census data.

For a lot more analysis, read Stephen Wolfram’s entire post.

Stephen Wolfram on Personal Data Analytics

Stephen Wolfram, the designer of Mathematica, believes that someday everyone will routinely collect all sorts of data about themselves.

In a fascinating blog post, Wolfram admits that he’s been collecting data for many years (since 1990!), and until now, hadn’t had the chance to truly analyze the data. Using the data analytics tools in the latest release of Wolfram Alpha, Stephen Wolfram provides a summary of his outgoing and incoming email (on a daily and monthly basis), the keystrokes he’s used on his computers, how much time he’s spent on the telephone, and the number of steps he’s taken on a daily basis (since 2010). He makes the following observation about his data collection:

The overall pattern is fairly clear. It’s meetings and collaborative work during the day, a dinner-time break, more meetings and collaborative work, and then in the later evening more work on my own. I have to say that looking at all this data I am struck by how shockingly regular many aspects of it are. But in general I am happy to see it. For my consistent experience has been that the more routine I can make the basic practical aspects of my life, the more I am able to be energetic—and spontaneous—about intellectual and other things.

Wolfram mentions that the data he presents in the blog post only touches the surface of the kinds of data he’s collected over the years. He’s also got years of curated medical test data, his complete genome, GPS location tracks, room-by-room motion sensor data, and “endless corporate records.” I am guessing a secondary post from him will be forthcoming some day.

As for Wolfram’s conclusions about the future of personal analytics?

There is so much that can be done. Some of it will focus on large-scale trends, some of it on identifying specific events or anomalies, and some of it on extracting “stories” from personal data.

And in time I’m looking forward to being able to ask Wolfram|Alpha all sorts of things about my life and times—and have it immediately generate reports about them. Not only being able to act as an adjunct to my personal memory, but also to be able to do automatic computational history—explaining how and why things happened—and then making projections and predictions.

As personal analytics develops, it’s going to give us a whole new dimension to experiencing our lives. At first it all may seem quite nerdy (and certainly as I glance back at this blog post there’s a risk of that). But it won’t be long before it’s clear how incredibly useful it all is—and everyone will be doing it, and wondering how they could have ever gotten by before. And wishing they had started sooner, and hadn’t “lost” their earlier years.

Definitely check out Stephen Wolfram’s detailed and insightful post. And if you’re interested in data analytics, this site is a great resource. I also recommend watching the brief TED talk “The Quantified Self” by Gary Wolf.