arXiv:2305.12143 Abstract | arXiv Analytics

arXiv:2305.12143 [cs.LG]Abstract References Reviews Resources

Learning Horn Envelopes via Queries from Large Language Models

Sophie Blum, Raoul Koudijs, Ana Ozaki, Samia Touileb

Published 2023-05-20Version 1

We investigate an approach for extracting knowledge from trained neural networks based on Angluin's exact learning model with membership and equivalence queries to an oracle. In this approach, the oracle is a trained neural network. We consider Angluin's classical algorithm for learning Horn theories and study the necessary changes to make it applicable to learn from neural networks. In particular, we have to consider that trained neural networks may not behave as Horn oracles, meaning that their underlying target theory may not be Horn. We propose a new algorithm that aims at extracting the ``tightest Horn approximation'' of the target theory and that is guaranteed to terminate in exponential time (in the worst case) and in polynomial time if the target has polynomially many non-Horn examples. To showcase the applicability of the approach, we perform experiments on pre-trained language models and extract rules that expose occupation-based gender biases.

Comments: 35 pages, 2 figures; submitted to the International Journal of Approximate Reasoning (IJAR)

Categories: cs.LG, cs.LO

Subjects: I.2.6, I.2.4

Keywords: large language models, learning horn envelopes, trained neural network, target theory, angluins exact learning model

Related articles: Most relevant | Search more

arXiv:2205.12615 [cs.LG] (Published 2022-05-25)

Autoformalization with Large Language Models

Yuhuai Wu, Albert Q. Jiang, Wenda Li, Markus N. Rabe, Charles Staats, Mateja Jamnik, Christian Szegedy

arXiv:2305.15594 [cs.LG] (Published 2023-05-24)

Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models

Haonan Duan, Adam Dziedzic, Nicolas Papernot, Franziska Boenisch

arXiv:2309.06236 [cs.LG] (Published 2023-09-12)