首页> 美国政府科技报告 >Using LSA to Compute Word Sense Frequencies; Final rept. Jul 2005-Nov 2007
【24h】

Using LSA to Compute Word Sense Frequencies; Final rept. Jul 2005-Nov 2007

机译:使用Lsa计算Word sense频率;最终的评论。 2005年7月至2007年11月

获取原文

摘要

This document describes a project to explore the use of Latent Semantic Analysis (LSA) and statistical clustering techniques for automatically identifying word senses and for estimating word sense frequencies from application relevant corpora. The hypothesis is that LSA can be used to compute context vectors for ambiguous words that can be clustered together - with each cluster corresponding to a different sense of the word. The document is organized as follows: the first section includes a short introduction to LSA, an introduction to the context-group discrimination paradigm adopted in the project, and a description of the corpus used in the experiments. Section 2 describes the investigation of the effect of LSA dimensionality on sense discrimination accuracy. Overall, sense discrimination accuracy was relatively low. This motivated a digression into investigation of the influence of different distance measures; investigation of the geometry of the sense clusters in the LSA-based space through silhouette value analysis; investigation of sense discrimination accuracy as a function of the degree of supervision provided during model training; and investigation and comparison of sense discrimination in homonyms versus polysemes. Section three describes the investigation of optimal context size for word sense discrimination from 3 (1 word on each side of word) to 11 words (5 words on each side). Section 4 describes the use of Minimal Description Length (MDL) to determine the number of word senses. Section 5 provides a project summary. Appendix A provides a literature review and Appendix B provides a source code listing (not included in this published report).

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号