arXiv:1912.12345 Abstract | arXiv Analytics

arXiv:1912.12345 [cs.LG]Abstract References Reviews Resources

Synthetic Datasets for Neural Program Synthesis

Richard Shin, Neel Kant, Kavi Gupta, Christopher Bender, Brandon Trabucco, Rishabh Singh, Dawn Song

Published 2019-12-27Version 1

The goal of program synthesis is to automatically generate programs in a particular language from corresponding specifications, e.g. input-output behavior. Many current approaches achieve impressive results after training on randomly generated I/O examples in limited domain-specific languages (DSLs), as with string transformations in RobustFill. However, we empirically discover that applying test input generation techniques for languages with control flow and rich input space causes deep networks to generalize poorly to certain data distributions; to correct this, we propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications. We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.

Comments: ICLR 2019

Categories: cs.LG, cs.AI, cs.PL, stat.ML

Keywords: neural program synthesis, synthetic datasets, deep networks, applying test input generation techniques, data distributions

Related articles: Most relevant | Search more

arXiv:1903.04991 [cs.LG] (Published 2019-03-12)

Theory III: Dynamics and Generalization in Deep Networks

Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Bob Liang, Jack Hidary, Tomaso Poggio

arXiv:1602.04484 [cs.LG] (Published 2016-02-14)

Dropout Versus Weight Decay for Deep Networks

David P. Helmbold, Philip M. Long

arXiv:1906.00150 [cs.LG] (Published 2019-06-01)

Sparsity Normalization: Stabilizing the Expected Outputs of Deep Networks