Do the Advantages of Deep Learning Justify the Cost?

This year’s STRATA Data Conference in San Jose was a great opportunity to discover new ideas, interact with fellow data science practitioners, and catch up on rising trends within the machine learning/AI world. A lot of focus (and many talks) centered around deep learning, and its applications across different industries. From more heavily regulated sectors

This year’s STRATA Data Conference in San Jose was a great opportunity to discover new ideas, interact with fellow data science practitioners, and catch up on rising trends within the machine learning/AI world.

A lot of focus (and many talks) centered around deep learning, and its applications across different industries. From more heavily regulated sectors where deep learning is used to inform and improve simpler model constructs, to the most advanced and unrestricted use cases shown by tech companies, deep learning stands as a promising solution to reducing feature engineering time and substantially boosting model performance. To this end, several companies came prepared with a variety of empirical results.

In particular, one algorithm that received a lot of attention was LSTM (Long Short-Term Memory artificial neural networks). LSTMs are designed to recognize patterns in sequence data, such as text, spoken language, or numerical time-series. In this context, Teradata proposed the application of LSTM to predict loan delinquencies and defaults on a dataset provided by Wells Fargo. The results showed a clear performance gain over the legacy logistic benchmark, mainly justified by the algorithm’s inherent ability to learn more complex function mappings between the model inputs and the target output.

Similarly, Microsoft used an LSTM framework to boost performance by +10.3% in a predictive maintenance problem where multi-variate time-series data was used to predict failure of mechanical components in aircraft engines.

Uber also demonstrated how LSTMs could be used for an unsupervised learning task in a production environment for app failure alerts. The algorithm would collect streams of time-series data, and produce a forecast with an associated confidence region. When the actual value of the target would exceed the confidence bounds, the system automatically triggers an app failure alert to the engineering team.

So, whether the problem is supervised or unsupervised, classification or regression, cross-sectional or time series, there seems to be a deep learning framework that can be architected to exploit these operational and performance benefits.

But the reality is that these benefits come at a cost. The increased complexity of these algorithms results in more demanding and time consuming hyper-parameter tuning, as well as requiring a more powerful infrastructure to support the computation. These models are not just intensive to train, but also interpreting how the model arrived at a certain result becomes more obscure, and attempting an explanation requires yet another challenging computational step. Finally, when the time comes to deploy the model into production, these models inherently entail larger memory requirements and longer runtimes to output a score.

One of the keynote speeches that resonated with me was offered by Dinesh Nirmal, VP of Analytics Development at IBM. Dinesh highlighted how currently, in an enterprise environment, the hardest part of the data science lifecycle is deploying models into production. To enforce this point, he mentioned a quote from the CTO of a major bank who said it took him three weeks to build a model and, after 11 months, the model had yet to be deployed.

Deep learning models are definitely among the most challenging to deploy, especially when the input data is in streaming and the response is required within milliseconds. In the aforementioned Uber case study, while the time-series data is available in streaming, the output of the unsupervised LSTM forecast is produced at best within a minute. But a minute is a very long time in the world of real-time platforms.  

So are the operational and the performance advantages around deep learning frameworks enough to offset the extra cost in terms of time, infrastructure, and resources? I believe that in many cases, the answer will be far from obvious.

 

STRATA Data Conference photo courtesy of O’Reilly via Flickr