首页> 外文会议>International Conference on String Processing and Information Retrieval >Indexing Text Documents Based on Topic Identification
【24h】

Indexing Text Documents Based on Topic Identification

机译:基于主题识别的索引文本文档

获取原文

摘要

This work provides algorithms and heuristics to index text documents by determining important topics in the documents. To index text documents, the work provides algorithms to generate topic candidates, determine their importance, detect similar and synonym topics, and to eliminate incoherent topics. The indexing algorithm uses topic frequency to determine the importance and the existence of the topics. Repeated phrases are topic candidates. For example, since the phrase 'index text documents' occurs three times in this abstract, the phrase is one of the topics of this abstract. It is shown that this method is more effective than either a simple word count model or approaches based on term weighting.
机译:这项工作通过确定文档中的重要主题提供算法和启发式索引文本文档。为了索引文本文档,该工作提供了生成主题候选的算法,确定它们的重要性,检测类似和同义词主题,并消除非联络主题。索引算法使用主题频率来确定主题的重要性和存在。重复的短语是主题候选人。例如,由于短语“索引文本文档”在此摘要中发生了三次,因此短语是此摘要的主题之一。结果表明,该方法比基于术语加权的简单字计模型或方法更有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号