【24h】

New information theoretic measures for text extraction

机译:新信息文本提取的理论措施

获取原文

摘要

In this paper we propose two information theoretic measures called collocation information (CI) and discrimination threshold (DT). These two measures are used to quantify the relationship between collocating words. CI of a word is derived from Shannon's measure for information while DT is the result of a constrained optimization in CI. Properties of CI and DT are studied and their application in generating sentence extracts as text document summaries is discussed. Extracts generated using these measures are evaluated using automatic evaluation schemes and their performance is compared with other known schemes.
机译:在本文中,我们提出了两个称为搭配信息(CI)和鉴别阈值(DT)的信息理论措施。这两种措施用于量化旋转词之间的关系。一个单词的CI是从Shannon的信息措施获取,而DT是CI中受约束优化的结果。研究了CI和DT的性质,并讨论了作为文本文件摘要生成句子提取物的应用。使用自动评估方案评估使用这些措施产生的提取物,并将其性能与其他已知方案进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号