首页> 外文会议> >Subject classification in the Oxford English Dictionary
【24h】

Subject classification in the Oxford English Dictionary

机译:牛津英语词典中的主题分类

获取原文

摘要

The Oxford English Dictionary is a valuable source of lexical information and a rich testing ground for mining highly structured text. Each entry is organized into a hierarchy of senses, which include definitions, labels and cited quotations. Subject labels distinguish the subject classification of a sense, for example they signal how a word may be used in anthropology, music or computing. Unfortunately subject labeling in the dictionary is incomplete. To overcome this incompleteness, we attempt to classify the senses (i.e., definitions) in the dictionary by their subjects, using the citations as an information guide. We report on four different approaches: k nearest neighbors, a standard classification technique; term weighting, an information retrieval method dealing with text; naive Bayes, a probabilistic method; and expectation maximization, an iterative probabilistic method. Experimental performance of these methods is compared based on standard classification metrics.
机译:牛津英语词典是词汇信息的宝贵来源,也是挖掘高度结构化文本的丰富测试平台。每个条目都组织成一个感官层次结构,包括定义,标签和引用的引号。主题标签区分了一种感官的主题分类,例如,它们表示一个词如何在人类学,音乐或计算机中使用。不幸的是,字典中的主题标签不完整。为了克服这种不完整性,我们尝试使用引文作为信息指南,按主题对词典中的意义(即定义)进行分类。我们报告了四种不同的方法:k最近邻,一种标准分类技术;术语加权,一种处理文本的信息检索方法;朴素贝叶斯,一种概率方法;和期望最大化,一种迭代概率方法。基于标准分类指标比较了这些方法的实验性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号