We present a novel method for mapping thematic trends called Classification by Preferential Clustered link" (CPCL). This method clusters relevant textual units (terms) from a corpus of texts, based on meaningful linguistic relations (syntacticvariations) identified amongst the units. Terms related through syntactic variations are represented in the form of a graph and are first clustered into connected components using the subset of variation relations affecting the modifier word(s) in a term. The connected components are in turn clustered into classes using the subset of variation relations affecting the head word in a term. Through a chronological analysis of the terms, the method pinpoints the evolution of research topics. The CPCL methoddiffers from classical data analysis methods in that it integrates n meaningful linguistic relations as classification criteria. Also, the method avoids the bias caused by fixing class size before classification and thus splitting classes artificiallyduring clustering. The graph formalism, the theoretical model underlying the CPCL method offers a powerful means of representing the linguistic relations between terms.
展开▼