18 Jun Interpreting AI Is More Than Black And White
Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke 
In the world of software development, there are well-defined testing paradigms and use-cases. Many teams will develop a suite of tests where the internals of the software programs are unknown, to assert the data output from a software system is as expected given the data fed in — i.e., black-box testing. Conversely, in white-box tests the program structure, design, and implementation being tested are known.
In the world of artificial intelligence & machine learning (AI & ML), black- and white-box categorization of models and algorithms refers to their interpretability. That is, given a model trained to map data inputs to outputs (e.g. emails classified as spam or not), if the mechanisms underlying predictions cannot be looked at or understood, that model is black-box. And just as the software testing dichotomy is high-level behavior vs low-level logic, only white-box AI methods can be readily interpreted to see the logic behind models’ predictions.
In recent years with machine learning taking over new industries and applications, where the number of users far outnumber experts that grok the models and algorithms, the conversation around interpretability has become an important one. Exponentially so with the rise of deep learning, a class of machine learning methods known for their powerful performance yet elusive, black-box nature. This is a conversation with implications up and down the stack, from AI scientists to CEOs.
What is interpretable AI?
The definition of interpretable AI isn’t exactly black and white. To have a productive conversation it’s essential to be clear what model interpretability means to different stakeholders:
- The ability to explain a model’s behavior, answering to an ML engineer, “why did the model predict that?” For example, the prior on variable alpha must not be Gaussian, as we can see in the misaligned posterior predictive check .
- The ability to translate a model to business objectives, answering in natural language, “why did the model predict that?” For example, the predicted spike in insulin levels are correlated to the recent prolonged inactivity picked up from the fitness watch.
Both definitions are clearly useful. The low-level notion of interpretability lends itself to the engineer’s ability to develop and debug models and algorithms. High-level transparency and explainability is just as necessary, for humans to understand and trust predictions in areas like financial markets and medicine [4,5].
No matter the definition, developing an AI system to be interpretable is typically challenging and ambiguous. It is often the case that a model or algorithm is too complex to understand or describe because its purpose is to model a complex hypothesis or navigate a high-dimensional space, a catch-22. Not to mention what is interpretable in one application may be useless in another.
Peering inside the box
Consider the visualizations below of deep Gaussian Processes, a probabilistic variety of deep neural networks . The intent of this experiment by Duvenaud et al. is to elucidate a subtle architectural flaw in neural networks as more and more layers are added; the specific insights are beyond the scope of this article, see . Even with the paper’s mathematical details of the model architecture and description of the subsequent adverse effects, the “pathology” is unintuitive to understand let alone debug and fix. The visualizations are quite effective in communicating modeling insights that would otherwise go unnoticed.
Post-hoc interpretation methods can provide insights into otherwise black-box AI systems. Nonetheless, the underlying models and algorithms may still be unexplainable. What then is a white-box model? The counterpart to post-hoc is model-based interpretability, where the model itself readily provides insights into the relationships and structures it learns from data. For example, a Gaussian Process is a flexible Bayesian model that enables feature engineering and incorporating domain expertise, and the predictions are intuitive to trace back to underlying logic — all properties of white-box ML. A Gaussian Process model can be powerful for many real-world machine learning tasks, from financial markets to autonomous robotics [8, 10]. Sometimes stacking Gaussian Processes into a complex network results in a significant performance boost, but also shifts the resulting model towards the black-box end of the spectrum. Developing an interpretable machine learning system can be a practice in building models for descriptive accuracy, but at the cost of predictive accuracy.
Do visualization methods make black-box models interpretable? Not quite. Even with advanced interpretation techniques such as Google’s “Inceptionism” , deep neural networks remain prohibitively complex to understand: the explanations underlying predictions (or the “why”) is unknown. Consider the remarkable sensitivity of deep neural networks to adversarial attacks, where slightly perturbed inputs (typically imperceptible to humans, such as a piece of duct tape on a stop sign) can completely throw off predictions , raising questions around what the models are actually learning inside that black-box.
The real issues with AI interpretability
Even with improved methods and…