This paper presents a machine learningapproach to acronym generation. We formalizethe generation process as a sequencelabeling problem on the letters inthe definition (expanded form) so that avariety of Markov modeling approachescan be applied to this task. To constructthe data for training and testing, weextracted acronym-definition pairs fromMEDLINE abstracts and manually annotatedeach pair with positional informationabout the letters in the acronym. Wehave built an MEMM-based tagger usingthis training data set and evaluated theperformance of acronym generation. Experimentalresults show that our machinelearning method gives significantly betterperformance than that achieved by thestandard heuristic rule for acronym generationand enables us to obtain multiplecandidate acronyms together with theirlikelihoods represented in probability values.
展开▼