首页> 外文期刊>International journal of computer mathematics >An automatic filtering method for field association words by deleting unnecessary words
【24h】

An automatic filtering method for field association words by deleting unnecessary words

机译:通过删除不必要的单词来自动过滤字段关联单词的方法

获取原文
获取原文并翻译 | 示例
       

摘要

Document classification and summarization are very important for document text retrieval. Generally, humans can recognize fields such as (Sports) or (Politics) based on specific words called Field Association (FA) words in those document fields. The traditional method causes misleading redundant words (unnecessary words) to be registered because the quality of the resulting FA words depends on learning data pre-classified by hand. Therefore recall and precision of document classification are degraded if the classified fields classified by hand are ambiguous. We propose two criteria: deleting unnecessary words with low frequencies, and deleting unnecessary words using category information. Moreover, using the proposed criteria unnecessary words can be deleted from the FA words dictionary created by the traditional method. Experimental results showed that 25% of 38 372 FA word candidates were identified as unnecessary and deleted automatically when the presented method was used. Furthermore, precision and F-measure were improved by 26% and 15%, respectively, compared with the traditional method.
机译:文档分类和摘要对于文档文本检索非常重要。通常,人们可以基于那些文档字段中称为“字段关联”(FA)单词的特定单词来识别诸如(体育)或(政治)之类的字段。传统方法会导致误导性冗余词(不必要的词)被注册,因为生成的FA词的质量取决于手工预先分类的学习数据。因此,如果手工分类的分类字段不明确,则会降低文档分类的查全率和准确性。我们提出两个标准:删除低频不必要的单词,以及使用类别信息删除不必要的单词。而且,使用所提出的标准,可以从通过传统方法创建的FA单词词典中删除不必要的单词。实验结果表明,在使用该方法时,38 372个FA单词候选中的25%被识别为不必要,并被自动删除。此外,与传统方法相比,精度和F量度分别提高了26%和15%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号