The relative difference between two data values is ofudinterest in a number of application domains includingudtemporal and spatial applications, schema versioning,uddata warehousing (particularly data preparation), internetudsearching, validation and error correction, anduddata mining. Moreover, consistency across systems inuddetermining such distances and the robustness of suchudcalculations is essential in some domains and useful inudmany. Despite this, there is no generally adopted approachudto determining such distances and no accommodationudof distance within SQL or any commerciallyudavailable DBMS.udFor non-numeric data values calculating the differenceudbetween values often requires application-specificudsupport but even for numeric values the practicaluddistance between two values may not simply beudtheir numeric difference or Euclidean distance.udIn this paper, a model of semantic distance isuddeveloped in which a graph-based approach is usedudto quantify the distance between two data values.udThe approach facilitates a notion of distance, bothudas a simple traversal distance and as weighted arcs.udTransition costs, as an additional expense of passingudthrough a node, are also accommodated. Furthermore,udmultiple distance measures can be incorporatedudand a method of ‘localisation’ is discussed which allowsudrelevant information to take precedence over lessudrelevant information. Some results from our investigations,udincluding our SQL based implementation, areudpresented.
展开▼