For Factual, The World Is One Big Data Problem

This is a very interesting article about Gil Elbaz, Caltech graduate, and the company he founded, Factual:

Geared to both big companies and smaller software developers, it includes available government data, terabytes of corporate data and information on 60 million places in 50 countries, each described by 17 to 40 attributes. Factual knows more than 800,000 restaurants in 30 different ways, including location, ownership and ratings by diners and health boards. It also contains information on half a billion Web pages, a list of America’s high schools and data on the offices, specialties and insurance preferences of 1.8 million United States health care professionals. There are also listings of 14,000 wine grape varietals, of military aircraft accidents from 1950 to 1974, and of body masses of major celebrities. Odd facts matter too, Mr. Elbaz notes.

He keeps 500 terabytes of storage near Factual’s headquarters. That’s about twice the amount needed to hold the entire Library of Congress. He has more data stored inside Amazon’s giant cloud of computers. His statisticians have cleaned and corrected data to account for things like how different health departments score sanitation, whether the term “middle school” means two years or three in a particular town, and whether there were revisions between an original piece of data and its duplicate.

A quote from Mr. Elbaz: “Having money is overrated when you are brought up not to believe you are entitled to it…You can make enough money to not need things, or you can just not need things.”


Related: Stephen Wolfram on Personal Data Analytics

Stephen Wolfram on Personal Data Analytics

Stephen Wolfram, the designer of Mathematica, believes that someday everyone will routinely collect all sorts of data about themselves.

In a fascinating blog post, Wolfram admits that he’s been collecting data for many years (since 1990!), and until now, hadn’t had the chance to truly analyze the data. Using the data analytics tools in the latest release of Wolfram Alpha, Stephen Wolfram provides a summary of his outgoing and incoming email (on a daily and monthly basis), the keystrokes he’s used on his computers, how much time he’s spent on the telephone, and the number of steps he’s taken on a daily basis (since 2010). He makes the following observation about his data collection:

The overall pattern is fairly clear. It’s meetings and collaborative work during the day, a dinner-time break, more meetings and collaborative work, and then in the later evening more work on my own. I have to say that looking at all this data I am struck by how shockingly regular many aspects of it are. But in general I am happy to see it. For my consistent experience has been that the more routine I can make the basic practical aspects of my life, the more I am able to be energetic—and spontaneous—about intellectual and other things.

Wolfram mentions that the data he presents in the blog post only touches the surface of the kinds of data he’s collected over the years. He’s also got years of curated medical test data, his complete genome, GPS location tracks, room-by-room motion sensor data, and “endless corporate records.” I am guessing a secondary post from him will be forthcoming some day.

As for Wolfram’s conclusions about the future of personal analytics?

There is so much that can be done. Some of it will focus on large-scale trends, some of it on identifying specific events or anomalies, and some of it on extracting “stories” from personal data.

And in time I’m looking forward to being able to ask Wolfram|Alpha all sorts of things about my life and times—and have it immediately generate reports about them. Not only being able to act as an adjunct to my personal memory, but also to be able to do automatic computational history—explaining how and why things happened—and then making projections and predictions.

As personal analytics develops, it’s going to give us a whole new dimension to experiencing our lives. At first it all may seem quite nerdy (and certainly as I glance back at this blog post there’s a risk of that). But it won’t be long before it’s clear how incredibly useful it all is—and everyone will be doing it, and wondering how they could have ever gotten by before. And wishing they had started sooner, and hadn’t “lost” their earlier years.

Definitely check out Stephen Wolfram’s detailed and insightful post. And if you’re interested in data analytics, this site is a great resource. I also recommend watching the brief TED talk “The Quantified Self” by Gary Wolf.