In this article, we present the use of a term bank for text classification purposes. We developed a supervised text classification approach which takes advantage of the domain-based structure of a term bank, namely TERMIUM Plus, as well as its bilingual content. The goal of the text classification task is to correctly identify the appropriate fine-grained domains of short segments of text in both French and English. We developed a vector space model for this task, which we refer to as the DCVSM (domain classification vector space model). In order to train and evaluate the DCVSM, we generated two new datasets from the open data contained in TERMIUM Plus. Results on these datasets show that the DCVSM compares favourably to five other supervised classification algorithms tested, achieving the highest micro-averaged recall (R@ 1).
展开▼