首页> 外文会议>National conference on emerging trends in computing communication >A New Hybrid Language Independent Extraction System
【24h】

A New Hybrid Language Independent Extraction System

机译:一种新的混合语言独立提取系统

获取原文

摘要

Kewords extraction is identification of thematic words from a document which can depict the overall theme of the document. This paper concentrates on a new hybrid language independent keywords extraction system. Proposed hybrid keywords extraction system is hybrid of approach suggested by Bun and Ishizuka (Topic extraction from news archive using TF-PDF algorithm. In: Proceedings of third international conference on web information system engineering (WISE 02), pp 73-82) and Lee and Kim (News keyword extraction for topic tracking. In: Proceedings of fourth international conference on networked computing and advanced information management, pp 554-559 [3]). For identification of key terms from text we have used the NTF1-PSF and NTF2-PSF measures which are modified improved form of conventional TF-ISF measure. The first variant of TF is Normalized Term Frequency 1 (NTF1) which is normalized by the maximum TF in a given sentence. The NTF2 is calculated by summing up the results of dividing the frequency of a given word appears in each sentence by the frequency of all words appear in each-sentence. Proportional-sentence-frequency (PSF) of a word in a given document is the exponential of the frequency of sentences containing the word j to the total sentence-frequency in the text document. Final keywords are obtained by taking intersection of keywords sets of NTF1-PSF and NTF2-PSF and union of title keywords set obtained by title keywords extraction. The efficiency of this language independent hybrid keywords extraction system is 84.39 %.
机译:Kewords提取是从一个文件的主题单词可以描绘文档的总主题鉴定。一种新的混合语言无关的关键字提取系统本文重点。提出的混合关键字提取系统的方法通过混合髻,石冢建议:和李(使用TF-PDF算法从新闻档案话题提取在。在网络信息系统工程(WISE 02),第73-82第三次国际会议论文集)和金(新闻关键词提取的话题跟踪:在联网的计算机和先进的信息管理,页554-559。[3]第四次国际会议论文集)。用于从文本我们使用这些改性改进常规TF-ISF措施的形式NTF1-PSF和NTF2-PSF措施关键术语的识别。 TF的第一变型是由最大TF在一个给定的归一句子标准化术语频率1(NTF1)。该NTF2是通过累加由的所有词的频率除以一个给定的字出现的频率在每个句子的结果计算出现在每个句子。给定文档中的单词的比例句子频(PSF)是指数包含单词J可在文本文档中的总句子频句子的频率。最终关键字通过取的关键字组NTF1-PSF和NTF2-PSF和标题的关键字的联盟的交集由标题的关键字提取获得获得。这种语言独立混成关键字提取系统的效率是84.39%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号