arXiv Analytics

Sign in

arXiv:1109.5078 [cs.LG]AbstractReferencesReviewsResources

Application of distances between terms for flat and hierarchical data

Jorge-Alonso Bedoya-Puerta, Jose Hernandez-Orallo

Published 2011-09-23Version 1

In machine learning, distance-based algorithms, and other approaches, use information that is represented by propositional data. However, this kind of representation can be quite restrictive and, in many cases, it requires more complex structures in order to represent data in a more natural way. Terms are the basis for functional and logic programming representation. Distances between terms are a useful tool not only to compare terms, but also to determine the search space in many of these applications. This dissertation applies distances between terms, exploiting the features of each distance and the possibility to compare from propositional data types to hierarchical representations. The distances between terms are applied through the k-NN (k-nearest neighbor) classification algorithm using XML as a common language representation. To be able to represent these data in an XML structure and to take advantage of the benefits of distance between terms, it is necessary to apply some transformations. These transformations allow the conversion of flat data into hierarchical data represented in XML, using some techniques based on intuitive associations between the names and values of variables and associations based on attribute similarity. Several experiments with the distances between terms of Nienhuys-Cheng and Estruch et al. were performed. In the case of originally propositional data, these distances are compared to the Euclidean distance. In all cases, the experiments were performed with the distance-weighted k-nearest neighbor algorithm, using several exponents for the attraction function (weighted distance). It can be seen that in some cases, the term distances can significantly improve the results on approaches applied to flat representations.

Related articles: Most relevant | Search more
arXiv:1204.5309 [cs.LG] (Published 2012-04-24, updated 2013-03-26)
Analysis Operator Learning and Its Application to Image Reconstruction
arXiv:1302.6937 [cs.LG] (Published 2013-02-27, updated 2014-06-10)
Online Convex Optimization Against Adversaries with Memory and Application to Statistical Arbitrage
arXiv:1408.4576 [cs.LG] (Published 2014-08-20)
Introduction to Clustering Algorithms and Applications