Nassim Taleb on Big Data

This is a strange article from Nassim Taleb, in which he cautions us about big data:

[B]ig data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is. Large deviations are likely to be bogus.

I had to re-read that sentence a few times. It still doesn’t make sense to me when I think of “big data.” As the sample size increases, large variations due to chance actually decrease. This is a good comment in the article which captures my thoughts:

This article is misleading. When the media/public talk about big data, they almost always mean big N data. Taleb is talking about data where P is “big” (i.e., many many columns but relative few rows, like genetic microarray data where you observe P = millions of genes for about N = 100 people), but he makes it sound like the issues he discuss apply to big N data as well. Big N data has the OPPOSITE properties of big P data—spurious correlations due to random noise are LESS likely with big N. Of course, the more important issue of causation versus correlation is an important problem when analyzing big data, but one that was not discussed in this article.

So I think Nassim Taleb should offer an explanation on what he means by BIG DATA.

The Man Who Killed Osama bin Laden is Struggling

Esquire Magazine details how the man who shot Osama bin Laden is left with no pension and no health insurance. The Shooter, as he is described in the piece, is struggling:

But the Shooter will discover soon enough that when he leaves after sixteen years in the Navy, his body filled with scar tissue, arthritis, tendonitis, eye damage, and blown disks, here is what he gets from his employer and a grateful nation:

Nothing. No pension, no health care, and no protection for himself or his family.

Since Abbottabad, he has trained his children to hide in their bathtub at the first sign of a problem as the safest, most fortified place in their house. His wife is familiar enough with the shotgun on their armoire to use it. She knows to sit on the bed, the weapon’s butt braced against the wall, and precisely what angle to shoot out through the bedroom door, if necessary. A knife is also on the dresser should she need a backup.

Then there is the “bolt” bag of clothes, food, and other provisions for the family meant to last them two weeks in hiding.

“Personally,” his wife told me recently, “I feel more threatened by a potential retaliatory terror attack on our community than I did eight years ago,” when her husband joined ST6.

The text accompanying the headline: “A startling failure of the United States government to help its most experienced and skilled warriors carry on with their lives.” Depressing.

A Cost Analysis of Observation Decks around the World

During my last visit to New York City, I avoided going to the “Top of the Rock” observation deck of the GE Building in favor of this view. In the process, I saved $25 and hours waiting in line.

The Economist published an interesting chart showing the price of admission to height of the public viewing platforms, sampling the most popular destinations around the world. Topping the list is the new building in London dubbed “The Shard”:

THE SHARD, the latest big skyscraper to pierce London’s skyline and the tallest building in Europe, recently opened for business—and to the general public. Some visitors have marvelled at the view from the top. Others have complained at the hefty entrance fee of £29.95 ($47) for an adult paying on the door. At a mere 244m (800 feet) high, the Shard is poor value for money when measured against its height.

height_buildings

The Empire State Building ranks third on this list. I think they are using the $42 adult admission price that includes both the 86th and 102nd floor viewings. Using the top deck height of 1250ft = 381.0m, the price per 1 meter of observation viewing is equal to 11.02 cents.

Missing on that chart is the price/height for “Top of the Rock,” which I calculate to be 9.65 cents (850 feet = 259.1m and a $25 admission price). That would put “Top of the Rock” as sixth most expensive observation viewing, which isn’t too bad.

What other observation towers are you familiar with that The Economist didn’t incorporate on their chart?

FedEx versus the Internet

Question: When, if ever, will the bandwidth of the Internet surpass that of FedEx?

That’s the question that Randall Munroe tackles in his latest “what-if” blog post. His conclusion? 2040. That answer depends on this huge assumption: if Internet transfer rates grow much faster than storage rates on hard drives, SD cards, etc.:

Those thumbnail-sized flakes have a storage density of up to 160 terabytes per kilogram, which means a FedEx fleet loaded with MicroSD cards could transfer about 177 petabits per second, or two zettabytes per day—a thousand times the internet’s current traffic level. (The infrastructure would be interesting—Google would need to build huge warehouses to hold a massive card-processing operation.)

Cisco estimates internet traffic is growing at about 29% annually. At that rate, we’d hit the FedEx point in 2040. Of course, the amount of data we can fit on a drive will have gone up by then, too. The only way to actually reach the FedEx point is if transfer rates grow much faster than storage rates. In an intuitive sense, this seems unlikely, since storage and transfer are fundamentally linked—all that data is coming from somewhere and going somewhere—but there’s no way to  predict usage patterns for sure.

While FedEx is big enough to keep up with the next few decades of actual usage, there’s no technological reason we can’t build a connection that beats them on bandwidth. There are experimental fiber clusters that can handle over a petabit per second. A cluster of 200 of those would beat FedEx.

If you recruited the entire US freight industry to move SD cards for you, the throughput would be on the order of 500 exabits—half a zettabit—per second. To match that transfer rate digitally, you’d need take half a million of those petabit cables.

Fascinating.

Elizabeth Gilbert on Writing

In this Paris Review piece published at the end of 2012, Julian Tepper writes about some (uncharacteristically caustic) writing advice he received from Philip Roth:

I would quit while you’re ahead. Really. It’s an awful field. Just torture. Awful. You write and you write, and you have to throw almost all of it away because it’s not any good. I would say just stop now. You don’t want to do this to yourself. That’s my advice to you.

This week, Elizabeth Gilbert countered with a brilliant post on Bookish.com, a site that was unveiled this week:

Because, seriously–is writing really all that difficult? Yes, of course, it is; I know this personally–but is it that much more difficult than other things? Is it more difficult than working in a steel mill, or raising a child alone, or commuting three hours a day to a deeply unsatisfying cubicle job, or doing laundry in a nursing home, or running a hospital ward, or being a luggage handler, or digging septic systems, or waiting tables at a delicatessen, or–for that matter–pretty much anything else that people do?

Not really, right?

In fact, I’m going to go out on a limb here and share a little secret about the writing life that nobody likes to admit: Compared to almost every other occupation on earth, it’s f*cking great. I say this as somebody who spent years earning exactly zero dollars for my writing (while waiting tables, like Mr. Tepper) and who now makes many dollars at it. But zero dollars or many dollars, I can honestly say it’s the best life there is, because you get to live within the realm of your own mind, and that is a profoundly rare human privilege. What’s more, you have no boss to speak of. You’re not exposed to any sexual abuse or toxic chemicals on the job site (unless you’re sexually abusing yourself, or eating Doritos while you type). You don’t have to wear a nametag, and–unless you are exceptionally clumsy–you rarely run the risk of cutting off your hand in the machinery. Writing, I tell you, has everything to recommend it over real work.

In fact, maybe that’s why established authors complain so loudly about their tormented existences–so nobody else will find out how great writing actually is, and take their jobs away. (Kind of like those people who come home from amazing holidays, and then lie to their neighbors about how terrible that remote Mexican beach was, just to make sure the place remains undiscovered and unruined forever.)

Or maybe it’s just vanity that makes authors gripe so much about their ordeal. Maybe writers have simply come to believe themselves to be so very special, and their work so very important, that they can’t imagine anybody else capable of doing it: You, little one, could never possibly create what I have created, or withstand all that I have withstood, so you’d best not try at all.

I recommend reading the whole response here.

###

(via Explore)

Monopoly: Now with More Cats

I hope you like cats in your Monopoly. According to The Associated Press, the iron is out and the cat is in, after Hasbro put the stake of a new token up to a Facebook vote:

The results were announced after the shoe, wheelbarrow and iron were neck and neck for elimination in the final hours of voting that sparked passionate efforts by fans to save their favorite tokens, and by businesses eager to capitalize on publicity surrounding pieces that represent their products.

The vote on Facebook closed just before midnight on Tuesday, marking the first time that fans have had a say on which of the eight tokens to add and which one to toss. The pieces identify the players and have changed quite a lot since Parker Brothers bought the game from its original designer in 1935.

I thought this was a really interesting manifestation of the campaign:

The social-media buzz created by the Save Your Token Campaign attracted numerous companies that pushed to protect specific tokens that reflect their products.

That includes garden tool maker Ames True Temper Inc. of Camp Hill, Penn., that spoke out in favor of the wheelbarrow and created a series of online videos that support the tool and online shoe retailer Zappos which pushed to save the shoe.

###

(via Consumerist)

On Hope and Numbers

We try to provide hope, but not false hope.

So we give ranges, starting with the best estimate of survival, because my patients have told me they shut down after they hear the worst estimate. We talk about setting goals, about maximizing quality of life, because we don’t have much leverage with quantity of life. We emphasize spending as much time as possible with family and friends, and as little time as possible with people wearing white coats. We tell them we’re not going to give up if they don’t give up.

But the truth is, we don’t know.

–Dr. Mikkael Sekeres, director of the leukemia program at the Cleveland Clinic, on hope and numbers in this New York Times post. Powerful.

Largest Prime Number Discovered

Back when I was in college, I participated in the great GIMPS Project, searching for what is known for a Mersenne prime number (Mersenne primes are of the form (2^X)-1, with the first primes being 3, 7, 31, and 127 corresponding to X = 2, 3, 5, and 7, respectively). My computer would use its extraneous resources to help in the search, and while nothing ever came of it, it’s pretty cool to know that I made a modest contribution to the project. So it was great to learn today that the GIMPS Project found the largest prime number ever as of January 2013. The largest (known) prime number now is 2^57,885,161-1, and its discovery is noted on this post:

The new prime number is a member of a special class of extremely rare prime numbers known as Mersenne primes. It is only the 48th known Mersenne prime ever discovered, each increasingly difficult to find. Mersenne primes were named for the French monk Marin Mersenne, who studied these numbers more than 350 years ago. GIMPS, founded in 1996, has discovered all 14 of the largest known Mersenne primes. Volunteers download a free program to search for these primes with a cash award offered to anyone lucky enough to compute a new prime. Chris Caldwell maintains an authoritative web site on the largest known primes as well as the history of Mersenne primes.

To prove there were no errors in the prime discovery process, the new prime was independently verified using different programs running on different hardware. Serge Batalov ran Ernst Mayer’s MLucas software on a 32-core server in 6 days (resource donated by Novartis[2] IT group) to verify the new prime. Jerry Hallett verified the prime using the CUDALucas software running on a NVidia GPU in 3.6 days. Finally, Dr. Jeff Gilchrist verified the find using the GIMPS software on an Intel i7 CPU in 4.5 days and the CUDALucas program on a NVidia GTX 560 Ti in 7.7 days.

This largest prime number contains 17,425,170 digits. If you have a fast Internet connection, you can see how huge this number is (with all of its digits written out one by one) by clicking here. Pretty cool.

The Great Norwegian Diaper Arbitrage

Matthew O’Brien reports on an interesting scheme going on in Europe: people from certain European countries are driving to Norway and emptying store shelves of diapers. Why? Because they can resell these diapers in their home countries for double the price.

There are lots of ways supermarkets can get customers in the door, and away from the competition. But in parts of Norway, cut-rate diapers have become the preferred lure. It’s set off something of a price war, which would be great news for Norwegian parents if they could actually find diapers in stock. They can’t. As Reuters reports, prices are so enticingly low that foreigners, mostly Poles and Lithuanians, have started trekking to Norway for the sole purpose of buying up every last diaper they can find. 
Here’s how the arbitrage math adds up. The ferry costs approximately $275 round trip, and gas is about $8 a gallon in Sweden, which, if we assume our car gets around 30 miles per gallon, gives us $435 in expenses. Throw in food, lodging, and other miscellaneous costs, and the total should come in around $600 or so. Remember, diapers costs more than twice as much in Lithuania as they do in Norway, so we only need to buy that much to break even. In other words, if we buy just $600 worth, which we can resell in Lithuania for double, we can cover our basic costs — and we can make enough profit to make the whole trip worth our while if we buy another couple hundred dollars worth. Of course, $1,000 worth isn’t very much when it comes to diaper arbitrage; Norwegian customs officials have seen people pack their cars with as much as $9,000 worth — good for more than $8,000 of profit. Not too shabby.
I don’t see how these prices can remain at such low levels in Norway for the foreseeable future…
###

Thierry Cohen’s Stunning Series on Darkened Cities

What would New York City, San Francisco, or Shanghai look like with a full sky of brilliant stars? Thierry Cohen, a French photographer, thinks he can show us by blending city scenes — shot and altered to eliminate lights and other pollution— and the night skies from less populated locations that fall on the same latitude on Earth. The result is what city dwellers might envision in the absence of any light pollution.

According to the NYT:

Paris gets the stars of northern Montana, New York those of the Nevada desert. As Cohen, whose work will be exhibited at the Danziger Gallery in New York in March, sees it, the loss of the starry skies, accelerated by worldwide population growth in cities, has created an urbanite who “forgets and no longer understands nature.” He adds, “To show him stars is to help him dream again.”

Below, a sample of these stunning photographs:

Shanghai without smog and light pollution.

Shanghai without smog and light pollution.

San Francisco.

San Francisco.

Starry New York City.

Starry New York City.

Los Angeles without the light pollution.

Los Angeles without the light pollution.

Hong Kong by night.

Hong Kong by night.

 

See the entire series on Thierry Cohen’s website.