arXiv:2306.03438 Abstract | arXiv Analytics

arXiv:2306.03438 [cs.LG]Abstract References Reviews Resources

Large Language Models of Code Fail at Completing Code with Potential Bugs

Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen, Sheng Zha, George Karypis

Published 2023-06-06Version 1

Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bugs in the code context for generation, which are inevitable in software development. Therefore, we introduce and study the buggy-code completion problem, inspired by the realistic scenario of real-time code suggestion where the code context contains potential bugs -- anti-patterns that can become bugs in the completed program. To systematically study the task, we introduce two datasets: one with synthetic bugs derived from semantics-altering operator changes (buggy-HumanEval) and one with realistic bugs derived from user submissions to coding problems (buggy-FixEval). We find that the presence of potential bugs significantly degrades the generation performance of the high-performing Code-LLMs. For instance, the passing rates of CodeGen-2B-mono on test cases of buggy-HumanEval drop more than 50% given a single potential bug in the context. Finally, we investigate several post-hoc methods for mitigating the adverse effect of potential bugs and find that there remains a large gap in post-mitigation performance.

Comments: 25 pages

Categories: cs.LG, cs.AI, cs.CL, cs.SE

Keywords: large language models, code fail, completing code, code context contains potential bugs, single potential bug

Related articles: Most relevant | Search more

arXiv:2305.05176 [cs.LG] (Published 2023-05-09)

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Lingjiao Chen, Matei Zaharia, James Zou

arXiv:2309.10668 [cs.LG] (Published 2023-09-19)

Language Modeling Is Compression

Grégoire Delétang et al.

arXiv:2302.06692 [cs.LG] (Published 2023-02-13)

Guiding Pretraining in Reinforcement Learning with Large Language Models

Yuqing Du et al.

arXiv Analytics

arXiv:2306.03438 [cs.LG]Abstract References Reviews Resources

Large Language Models of Code Fail at Completing Code with Potential Bugs

Links

Toolbox

arXiv:2306.03438 [cs.LG]AbstractReferencesReviewsResources

Large Language Models of Code Fail at Completing Code with Potential Bugs

Links

Toolbox

arXiv:2306.03438 [cs.LG]Abstract References Reviews Resources