On Learning Data Science

I’ve been learning more about data science in the last couple of months and recently stumbled upon a very good blog post from Dataquest on how to learn data science.

First, it’s important that there is some inherent motivation to learn data science:

Nobody ever talks about motivation in learning. Data science is a broad and fuzzy field, which makes it hard to learn. Really hard. Without motivation, you’ll end up stopping halfway through and believing you can’t do it, when the fault isn’t with you – it’s with the teaching.

You need something that will motivate you to keep learning, even when it’s midnight, formulas are starting to look blurry, and you’re wondering if this will be the night that neural networks finally make sense.

You need something that will make you find the linkages between statistics, linear algebra, and neural networks. Something that will prevent you from struggling with the “what do I learn next?” question.

My entry point to data science was predicting the stock market, although I didn’t know it at the time. Some of the first programs I coded to predict the stock market involved almost no statistics. But I knew they weren’t performing well, so I worked day and night to make them better.

There are good links throughout, including 100 data sets for statistics.

I like the suggestions on communicating your findings and/or your learning process:

Part of communicating insights is understanding the topic and theory well. Another part is understanding how to clearly organize your results. The final piece is being able to explain your analysis clearly.

It’s hard to get good at communicating complex concepts effectively, but here are some things you should try:

Start a blog. Post the results of your data analysis.

Try to teach your less tech-savvy friends and family about data science concepts. It’s amazing how much teaching can help you understand concepts…

More resources and links here.

JPMorgan Chase Has More than 1,000 Models in Production

This afternoon, I spent some time reviewing the annual shareholder letter from JPMorgan Chase. The most interesting bit to me was this section on Model Risk Management (“Model review”) at the Bank:

More than 300 employees are working in Model Risk and Development. In 2014, this highly specialized team completed over 500 model reviews, implemented a system to assess the ongoing performance of the 1,000+ most complex models in the firm, and continued to enhance capital and loss models for our company.

So there at least 1,000 models currently in production at JPMorgan Chase, which doesn’t include the non-complex models…

I also thought Jamie Dimon’s comments on the Comprehensive Capital Analysis and Review (CCAR) were illuminating:

We believe that we would perform far better under the Fed’s stress scenario than the Fed’s stress test implies. Let me be perfectly clear – I support the Fed’s stress test, and we at JPMorgan Chase think that it is important that the Fed stress test each bank the way it does. But it also is important for our shareholders to understand the difference between the Fed’s stress test and what we think actually would happen. Here are a few examples of where we are fairly sure we would do better than the stress test would imply:

  • We would be far more aggressive on cutting expenses, particularly compensation, than the stress test allows.
  • We would quickly cut our dividend and stock buyback programs to conserve capital. In fact, we reduced our dividend dramatically in the first quarter of 2009 and stopped all stock buybacks in the first quarter of 2008.
  • We would not let our balance sheet grow quickly. And if we made an acquisition, we would make sure we were properly capitalized for it. When we bought Washington Mutual (WaMu) in September of 2008, we immediately raised $11.5 billion in common equity to protect our capital position. There is no way we would make an acquisition that would leave us in a precarious capital position.
  • And last, our trading losses would unlikely be $20 billion as the stress test shows. The stress test assumes that dramatic market moves all take place on one day and that there is very little recovery of values. In the real world, prices drop over time, and the volatility of prices causes bid/ask spreads to widen – which helps marketmakers. In a real-world example, in the six months after the Lehman Brothers crisis, J.P. Morgan’s actual trading results were $4 billion of losses – a significant portion of which related to the Bear Stearns acquisition – which would not be repeated. We also believe that our trading exposures are much more conservative today than they were during the crisis.

The last point is important because the way the scenarios have worked in the recent years for CCAR, the assumption was that there was a one-time (one day to less than a month-long), massive shock to the equity markets (50 to 60% drop in the severely adverse case).