arXiv Analytics

Sign in

arXiv:1205.3217 [stat.AP]AbstractReferencesReviewsResources

A Generalized Fellegi-Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems

Mauricio Sadinle, Stephen E. Fienberg

Published 2012-05-14, updated 2013-02-06Version 2

We present a probabilistic method for linking multiple datafiles. This task is not trivial in the absence of unique identifiers for the individuals recorded. This is a common scenario when linking census data to coverage measurement surveys for census coverage evaluation, and in general when multiple record-systems need to be integrated for posterior analysis. Our method generalizes the Fellegi-Sunter theory for linking records from two datafiles and its modern implementations. The multiple record linkage goal is to classify the record K-tuples coming from K datafiles according to the different matching patterns. Our method incorporates the transitivity of agreement in the computation of the data used to model matching probabilities. We use a mixture model to fit matching probabilities via maximum likelihood using the EM algorithm. We present a method to decide the record K-tuples membership to the subsets of matching patterns and we prove its optimality. We apply our method to the integration of three Colombian homicide record systems and we perform a simulation study in order to explore the performance of the method under measurement error and different scenarios. The proposed method works well and opens some directions for future research.

Comments: Several changes with respect to previous version. Accepted in the Journal of the American Statistical Association
Categories: stat.AP, stat.ME, stat.ML, stat.OT
Related articles: Most relevant | Search more
arXiv:0906.3465 [stat.AP] (Published 2009-06-18, updated 2010-11-09)
Transposable regularized covariance models with an application to missing data imputation
arXiv:1403.1783 [stat.AP] (Published 2014-03-07)
Bayesian spatio-temporal epidemic models with applications to sheep pox
arXiv:1211.3087 [stat.AP] (Published 2012-11-13)
Metastatistics of Extreme Values and its Application in Hydrology