{ "id": "2311.10098", "version": "v1", "published": "2023-10-31T17:44:04.000Z", "updated": "2023-10-31T17:44:04.000Z", "title": "Automated Parliaments: A Solution to Decision Uncertainty and Misalignment in Language Models", "authors": [ "Thomas Forster", "Jonathan Ouwerx", "Shak Ragoler" ], "comment": "39 pages, 4 figures", "categories": [ "cs.AI" ], "abstract": "As AI takes on a greater role in the modern world, it is essential to ensure that AI models can overcome decision uncertainty and remain aligned with human morality and interests. This research paper proposes a method for improving the decision-making of language models (LMs) via Automated Parliaments (APs) - constructs made of AI delegates each representing a certain perspective. Delegates themselves consist of three AI models: generators, modifiers, and evaluators. We specify two mechanisms for producing optimal solutions: the Simultaneous Modification mechanism for response creation and an evaluation mechanism for fairly assessing solutions. The overall process begins when each generator creates a response aligned with its delegate's theory. The modifiers alter all other responses to make them more self-aligned. The evaluators collectively assess the best end response. Finally, the modifiers and generators learn from feedback from the evaluators. In our research, we tested the evaluation mechanism, comparing the use of single-value zero-shot prompting and AP few-shot prompting in evaluating morally contentious scenarios. We found that the AP architecture saw a 57.3% reduction in its loss value compared to the baseline. We conclude by discussing some potential applications of APs and specifically their potential impact when implemented as Automated Moral Parliaments.", "revisions": [ { "version": "v1", "updated": "2023-10-31T17:44:04.000Z" } ], "analyses": { "keywords": [ "language models", "automated parliaments", "ai models", "misalignment", "evaluation mechanism" ], "note": { "typesetting": "TeX", "pages": 39, "language": "en", "license": "arXiv", "status": "editable" } } }