The derivation is described of a probabilistic grammar for main subject field codes from the machine readable version of the Longman Dictionary of Contemporary English (LDOCE) (P. Procter, 1978). These codes are used in the dictionary to mark the subject area to which a certain sense of a word belongs. The grammar consists of the dictionary itself and a matrix that describes how closely two main subject fields are related to each other in a large training corpus of unrestricted English text.
展开▼