arXiv Analytics

Sign in

arXiv:2010.09345 [cs.LG]AbstractReferencesReviewsResources

A Framework to Learn with Interpretation

Jayneel Parekh, Pavlo Mozharovskyi, Florence d'Alche-Buc

Published 2020-10-19Version 1

With increasingly widespread use of deep neural networks in critical decision-making applications, interpretability of these models is becoming imperative. We consider the problem of jointly learning a predictive model and its associated interpretation model. The task of the interpreter is to provide both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, without any loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose a high level of conciseness by constraining the activation of a very few attributes for a given input with a real-entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A major advantage of simultaneous learning is that the predictive neural network benefits from the interpretability constraint as well. We also develop a more detailed pipeline based on some common and novel simple tools to develop understanding about the learnt features. We show on two datasets, MNIST and QuickDraw, their relevance for both global and local interpretability.

Related articles: Most relevant | Search more
arXiv:1811.10469 [cs.LG] (Published 2018-11-21)
How to improve the interpretability of kernel learning
arXiv:1910.03081 [cs.LG] (Published 2019-10-07)
On the Interpretability and Evaluation of Graph Representation Learning
arXiv:2001.02522 [cs.LG] (Published 2020-01-08)
On Interpretability of Artificial Neural Networks