arXiv:2410.11444 Abstract | arXiv Analytics

arXiv:2410.11444 [cs.LG]Abstract References Reviews Resources

On Championing Foundation Models: From Explainability to Interpretability

Shi Fu, Yuzhu Chen, Yingjie Wang, Dacheng Tao

Published 2024-10-15Version 1

Understanding the inner mechanisms of black-box foundation models (FMs) is essential yet challenging in artificial intelligence and its applications. Over the last decade, the long-running focus has been on their explainability, leading to the development of post-hoc explainable methods to rationalize the specific decisions already made by black-box FMs. However, these explainable methods have certain limitations in terms of faithfulness, detail capture and resource requirement. Consequently, in response to these issues, a new class of interpretable methods should be considered to unveil the underlying mechanisms in an accurate, comprehensive, heuristic and resource-light way. This survey aims to review interpretable methods that comply with the aforementioned principles and have been successfully applied to FMs. These methods are deeply rooted in machine learning theory, covering the analysis of generalization performance, expressive capability, and dynamic behavior. They provide a thorough interpretation of the entire workflow of FMs, ranging from the inference capability and training dynamics to their ethical implications. Ultimately, drawing upon these interpretations, this review identifies the next frontier research directions for FMs.

Comments: 45 pages, 14 figures

Categories: cs.LG, cs.AI, stat.ML

Keywords: championing foundation models, explainability, interpretability, black-box foundation models, frontier research directions

Related articles: Most relevant | Search more

arXiv:2001.02522 [cs.LG] (Published 2020-01-08)

On Interpretability of Artificial Neural Networks

Fenglei Fan, Jinjun Xiong, Ge Wang

arXiv:2403.06425 [cs.LG] (Published 2024-03-11)

A Differential Geometric View and Explainability of GNN on Evolving Graphs

Yazheng Liu, Xi Zhang, Sihong Xie

arXiv:1910.03081 [cs.LG] (Published 2019-10-07)

On the Interpretability and Evaluation of Graph Representation Learning