arXiv Analytics

Sign in

arXiv:2202.11789 [cs.LG]AbstractReferencesReviewsResources

Investigating the effect of binning on causal discovery

Andrew Colt Deckert, Erich Kummerfeld

Published 2022-02-23Version 1

Binning (a.k.a. discretization) of numerically continuous measurements is a wide-spread but controversial practice in data collection, analysis, and presentation. The consequences of binning have been evaluated for many different kinds of data analysis methods, however so far the effect of binning on causal discovery algorithms has not been directly investigated. This paper reports the results of a simulation study that examined the effect of binning on the Greedy Equivalence Search (GES) causal discovery algorithm. Our findings suggest that unbinned continuous data often result in the highest search performance, but some exceptions are identified. We also found that binned data are more sensitive to changes in sample size and tuning parameters, and identified some interactive effects between sample size, binning, and tuning parameter on performance.

Journal: in 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, USA, 2019 pp. 2574-2581
Categories: cs.LG
Related articles: Most relevant | Search more
arXiv:2102.03274 [cs.LG] (Published 2021-02-05)
On the Sample Complexity of Causal Discovery and the Value of Domain Expertise
arXiv:2205.13869 [cs.LG] (Published 2022-05-27)
MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models
Erdun Gao et al.
arXiv:2307.09552 [cs.LG] (Published 2023-07-18)
Self-Compatibility: Evaluating Causal Discovery without Ground Truth