首页> 外文会议>International Conference on Information Systems and Computer Networks >A novel statistical and linguistic features based technique for keyword extraction
【24h】

A novel statistical and linguistic features based technique for keyword extraction

机译:一种基于统计和语言特征的新颖关键词提取技术

获取原文

摘要

WWW is a decentralized, distributed and heterogeneous information resource. With increased availability of information through WWW, it is very difficult to read all documents to retrieve the desired results; therefore there is a need of summarization methods which can help in providing contents of a given document in a precise manner. Keywords of a document may provide a compact representation of a document's content. As a result various algorithms and systems intended to carry out automatic keywords extraction have been proposed in the recent past. However, the existing solutions require either training models or domain specific information for automatic keyword extraction. To cater to these shortcomings an innovative hybrid approach for automatic keyword extraction using statistical and linguistic features of a document has been proposed. This statistical and linguistic technique based keyword extraction works on an individual document without any prior parameter change and takes full advantage of all the features of the document to extract the keywords. The extracted keywords can than assist in domain specific indexing. The performance of the proposed method as compared to existing Keyword Extraction tools such as Dream web design etc. in terms of Precision and Recall are also presented in this paper.
机译:WWW是一种去中心化,分布式和异构信息资源。随着通过WWW信息可用性的提高,很难读取所有文档以检索所需的结果。因此,需要一种总结方法,以帮助以精确的方式提供给定文档的内容。文档的关键字可以提供文档内容的紧凑表示。结果,最近已经提出了旨在进行自动关键词提取的各种算法和系统。但是,现有的解决方案需要训练模型或特定领域的信息来自动提取关键字。为了解决这些缺点,已经提出了一种创新的混合方法,该方法使用文档的统计和语言特征来自动提取关键词。这种基于统计和语言技术的关键字提取功能可在不更改任何先前参数的情况下对单个文档进行操作,并充分利用了文档的所有功能来提取关键字。然后,所提取的关键字可以协助特定领域的索引编制。本文还介绍了与现有的关键字提取工具(例如Dream网站设计等)相比,该方法在精度和召回率方面的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号