首页> 外文会议>IEEE International Conference on Data Mining >Subject classification in the oxford English dictionary
【24h】

Subject classification in the oxford English dictionary

机译:牛津英语词典中的主题分类

获取原文

摘要

The Oxford English Dictionary is a valuable source of lexical information and a rich testing ground for mining highly structured text. Each entry is organized into a hierarchy of senses, which include definitions, labels and cited quotations. Subject labels distinguish the subject classification of a sense, for example they signal how a word may be used in Anthropology, Music or Computing. Unfortunately subject labeling in the dictionary is incomplete. To overcome this incompleteness, we attempt to classify the senses (i.e., definitions) in the dictionary by their subjects, using the citations as an information guide. We report on four different approaches: k Nearest Neighbors, a standard classification technique; Term Weighting, an information retrieval method dealing with text; Naive Bayes, a probabilistic method; and Expectation Maximization, an iterative probabilistic method. Experimental performance of these methods is compared based on standard classification metrics.
机译:牛津英语词典是词汇信息的宝贵来源,以及用于开采高度结构化文本的丰富的测试理由。每个条目都被组织成感官的层次结构,包括定义,标签和引用的引用。主题标签区分感觉的主题分类,例如它们的信号如何在人类学,音乐或计算中使用单词。不幸的是,词典标记在字典中是不完整的。为了克服这种不完整性,我们将尝试使用引用作为信息指南将其主题分类在字典中的感官(即定义)。我们报告四种不同的方法:K最近邻居,标准分类技术;术语加权,处理文本的信息检索方法;朴素的贝叶斯,概率方法;和期望最大化,迭代概率方法。基于标准分类度量比较了这些方法的实验性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号