arXiv Analytics

Sign in

arXiv:2203.17250 [cs.LG]AbstractReferencesReviewsResources

Generation and Simulation of Synthetic Datasets with Copulas

Regis Houssou, Mihai-Cezar Augustin, Efstratios Rappos, Vivien Bonvin, Stephan Robert-Nicoud

Published 2022-03-30Version 1

This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm for generating a synthetic data set comprising numeric or categorical variables. Applying our methodology to two datasets shows better performance compared to other methods such as SMOTE and autoencoders.

Related articles: Most relevant | Search more
arXiv:2502.04140 [cs.LG] (Published 2025-02-06)
Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs
arXiv:1912.12345 [cs.LG] (Published 2019-12-27)
Synthetic Datasets for Neural Program Synthesis
arXiv:2107.08928 [cs.LG] (Published 2021-07-19)
Introducing a Family of Synthetic Datasets for Research on Bias in Machine Learning