On NYC’s Open Data Portal and Parking Tickets

The author of the I Quant NY blog profiles an excellent use of of NYC’s Open Data portal in a post detailing how the city has been systematically ticketing legally parked cars:

As of late 2008, in NYC you can park in front of a sidewalk pedestrian ramp, as long as it’s not connected to a crosswalk.  It’s all written up in the NYC Traffic Rules, and for more detail, take a look at this article.

Is it a problem that drivers don’t realize that there are some extra parking spots they are now allowed to park in?  Not so much.  But, I’ve got a pedestrian ramp leading to nowhere particular in the middle of my block in Brooklyn, and on occasion I have parked there.  Despite the fact that it is legal, I’ve been ticketed for parking there.  Though I get the tickets dismissed, it’s a waste of everybody’s time. And that got me wondering- How common is it for the police to give tickets to cars legally parked in front of pedestrian ramps?  It couldn’t be just me…

In the past, there was not much you could do to stop something like this. Complaining to your local precinct would at best only solve the problem locally.  But thanks to NYC’s Open Data portal, I was able to look at the most common parking spots in the City where cars were ticketed for blocking pedestrian ramps.   It’s worth taking a moment upfront here to praise the NYPD for offering this dataset to begin with.  Though we are behind on police crime data in the city, we are ahead in other ways and the parking ticket dataset is definitely one of them.  

The response from the NYPD that the author received speaks volume (an admission of mistake and a promise to get it right with the proper training):

Mr. Wellington’s analysis identified errors the department made in issuing parking summonses. It appears to be a misunderstanding by officers on patrol of a recent, abstruse change in the parking rules.  We appreciate Mr. Wellington bringing this anomaly to our attention.

The department’s internal analysis found that patrol officers who are unfamiliar with the change have observed vehicles parked in front of pedestrian ramps and issued a summons in error. When the rule changed in 2009 to allow for certain pedestrian ramps to be blocked by parked vehicles, the department focused training on traffic agents, who write the majority of summonses.

Yet, the majority of summonses written for this code violation were written by police officers. As a result, the department sent a training message to all officers clarifying the rule change and has communicated to commanders of precincts with the highest number of summonses, informing them of the issues within their command.

Thanks to this analysis and the availability of this open data, the department is also taking steps to digitally monitor these types of summonses to ensure that they are being issued correctly.

Worth reading in entirety here.

On Learning Data Science

I’ve been learning more about data science in the last couple of months and recently stumbled upon a very good blog post from Dataquest on how to learn data science.

First, it’s important that there is some inherent motivation to learn data science:

Nobody ever talks about motivation in learning. Data science is a broad and fuzzy field, which makes it hard to learn. Really hard. Without motivation, you’ll end up stopping halfway through and believing you can’t do it, when the fault isn’t with you – it’s with the teaching.

You need something that will motivate you to keep learning, even when it’s midnight, formulas are starting to look blurry, and you’re wondering if this will be the night that neural networks finally make sense.

You need something that will make you find the linkages between statistics, linear algebra, and neural networks. Something that will prevent you from struggling with the “what do I learn next?” question.

My entry point to data science was predicting the stock market, although I didn’t know it at the time. Some of the first programs I coded to predict the stock market involved almost no statistics. But I knew they weren’t performing well, so I worked day and night to make them better.

There are good links throughout, including 100 data sets for statistics.

I like the suggestions on communicating your findings and/or your learning process:

Part of communicating insights is understanding the topic and theory well. Another part is understanding how to clearly organize your results. The final piece is being able to explain your analysis clearly.

It’s hard to get good at communicating complex concepts effectively, but here are some things you should try:

Start a blog. Post the results of your data analysis.

Try to teach your less tech-savvy friends and family about data science concepts. It’s amazing how much teaching can help you understand concepts…

More resources and links here.

On Facebook’s Massive Data Center near the Arctic

A fascinating look in Businessweek at Facebook’s data center in a Swedish town of Luleå (population 75,000), located about 70 miles from the Arctic Circle:

The heart of Facebook’s experiment lies just south of the Arctic Circle, in the Swedish town of Luleå. In the middle of a forest at the edge of town, the company in June opened its latest megasized data center, a giant building that comprises thousands of rectangular metal panels and looks like a wayward spaceship. By all public measures, it’s the most energy-efficient computing facility ever built, a colossus that helps Facebook process 350 million photographs, 4.5 billion “likes,” and 10 billion messages a day. While an average data center needs 3 watts of energy for power and cooling to produce 1 watt for computing, the Luleå facility runs nearly three times cleaner, at a ratio of 1.04 to 1. “What Facebook has done to the hardware market is dramatic,” says Tom Barton, the former chief executive officer of server maker Rackable Systems (SGI). “They’re putting pressure on everyone.”

There’s a reason why they chose this place:

The location has a lot to do with the system’s efficiency. Sweden has a vast supply of cheap, reliable power produced by its network of hydroelectric dams. Just as important, Facebook has engineered its data center to turn the frigid Swedish climate to its advantage. Instead of relying on enormous air-conditioning units and power systems to cool its tens of thousands of computers, Facebook allows the outside air to enter the building and wash over its servers, after the building’s filters clean it and misters adjust its humidity. Unlike a conventional, warehouse-style server farm, the whole structure functions as one big device.

To simplify its servers, which are used mostly to create Web pages, Facebook’s engineers stripped away typical components such as extra memory slots and cables and protective plastic cases. The servers are basically slimmed-down, exposed motherboards that slide into a fridge-size rack. The engineers say this design means better airflow over each server. The systems also require less cooling, because with fewer components they can function at temperatures as high as 85F. (Most servers are expected to keel over at 75F.)

Now you know where those photos and messages are stored!

Your E-Book Is Reading You

With the increased proliferation of e-books, publishers are using data analytics to determine what and how people are reading on their e-book devices. The Wall Street Journal provides some detail:

Barnes & Noble, which accounts for 25% to 30% of the e-book market through its Nook e-reader, has recently started studying customers’ digital reading behavior. Data collected from Nooks reveals, for example, how far readers get in particular books, how quickly they read and how readers of particular genres engage with books. Jim Hilt, the company’s vice president of e-books, says the company is starting to share their insights with publishers to help them create books that better hold people’s attention.

Some details on which books tend to get dropped by readers:

Barnes & Noble has determined, through analyzing Nook data, that nonfiction books tend to be read in fits and starts, while novels are generally read straight through, and that nonfiction books, particularly long ones, tend to get dropped earlier. Science-fiction, romance and crime-fiction fans often read more books more quickly than readers of literary fiction do, and finish most of the books they start. Readers of literary fiction quit books more often and tend skip around between books.

Those insights are already shaping the types of books that Barnes & Noble sells on its Nook. Mr. Hilt says that when the data showed that Nook readers routinely quit long works of nonfiction, the company began looking for ways to engage readers in nonfiction and long-form journalism. They decided to launch “Nook Snaps,” short works on topics ranging from weight loss and religion to the Occupy Wall Street movement.

Not very surprising, I suppose. I’d be interested in finding out what the criteria for a drop are: is it starting to read another book? No change in page numbers in a week? Longer?

Another thing to consider: giving readers what they want based on analytics can backfire. Imagine someone who’s read a longer book than they otherwise would have and their sense of accomplishment after finishing versus a publisher that tells authors to limit how and what they put on the page. As one astute publisher noted: “We’re not going to shorten War and Peace because someone didn’t finish it.”

For Factual, The World Is One Big Data Problem

This is a very interesting article about Gil Elbaz, Caltech graduate, and the company he founded, Factual:

Geared to both big companies and smaller software developers, it includes available government data, terabytes of corporate data and information on 60 million places in 50 countries, each described by 17 to 40 attributes. Factual knows more than 800,000 restaurants in 30 different ways, including location, ownership and ratings by diners and health boards. It also contains information on half a billion Web pages, a list of America’s high schools and data on the offices, specialties and insurance preferences of 1.8 million United States health care professionals. There are also listings of 14,000 wine grape varietals, of military aircraft accidents from 1950 to 1974, and of body masses of major celebrities. Odd facts matter too, Mr. Elbaz notes.

He keeps 500 terabytes of storage near Factual’s headquarters. That’s about twice the amount needed to hold the entire Library of Congress. He has more data stored inside Amazon’s giant cloud of computers. His statisticians have cleaned and corrected data to account for things like how different health departments score sanitation, whether the term “middle school” means two years or three in a particular town, and whether there were revisions between an original piece of data and its duplicate.

A quote from Mr. Elbaz: “Having money is overrated when you are brought up not to believe you are entitled to it…You can make enough money to not need things, or you can just not need things.”

###

Related: Stephen Wolfram on Personal Data Analytics

Why Young People go into Finance, Law, and Consulting

Tyler Cowen has a simple theory why young people tend to go into law, finance, and consulting:

The age structure of achievement is being ratcheted upward, due to specialization and the growth of knowledge.  Mathematicians used to prove theorems at age 20, now it happens at age 30, because there is so much to learn along the way.  If you are a smart 22-year-old, just out of Harvard, you probably cannot walk into a widget factory and quickly design a better machine.  (Note that in “immature” economic sectors, such as social networks circa 2006, young people can and do make immediate significant contributions and indeed they dominated the sector.)  Yet you and your parents expect you to earn a high income — now — and to affiliate with other smart, highly educated people, maybe even marry one of them.  It won’t work to move to Dayton and spend four years studying widget machines.

You will seek out jobs which reward a high “G factor,” or high general intelligence.  That means finance, law, and consulting.  You are productive fairly quickly, you make good contacts with other smart people, and you can demonstrate that you are smart, for future employment prospects.

Combined with the fact that these jobs tend to be higher-paying than anything else available, and we’ve got a recipe for young people to pass opportunities in technology, public service, and the like. This New York Times piece sheds some data on percentage of people from Ivy League schools that directly entered finance jobs. For example, those graduating from Harvard were more likely to enter finance than any other career (in fact, 17 percent of new grads did so in 2010, which is down from 28% in 2008, just before the financial crisis).