Data Fusion by Matrix Factorization
Published 2013-07-02Version 1
For most problems in science and engineering we can obtain data that describe the system from various perspectives and record the behaviour of its individual components. Heterogeneous data sources can be collectively mined by data fusion. Fusion can focus on a specific target relation and exploit directly associated data together with data on the context or additional constraints. In the paper we describe a data fusion approach with penalized matrix tri-factorization that simultaneously factorizes data matrices to reveal hidden associations. The approach can directly consider any data sets that can be expressed in a matrix, including those from attribute-based representations, ontologies, associations and networks. We demonstrate its utility on a gene function prediction problem in a case study with eleven different data sources. Our fusion algorithm compares favourably to state-of-the-art multiple kernel learning and achieves higher accuracy than can be obtained from any single data source alone.