首页> 外文会议>The 2nd International Conference on Information Engineering and Computer Science >A Principle Component Analysis Based Method to Normalize Term Weights
【24h】

A Principle Component Analysis Based Method to Normalize Term Weights

机译:基于主成分分析的术语权重归一化方法

获取原文

摘要

Term Weighting is a significant step in Document formalization in Natural Language Processing. It greatly interferes the accuracy of natural language processing systems. Term weight consists of three parts: Global Term Weight, Local Term Weight and standardization factor. Many term weight algorithms have been presented to address each part. And currently, the final term weight is the product of multiple term weight algorithms. However, the results of different term weight algorithms are correlated to each other, which indicates the redundant overlapped information between them. Simply multiplying the results leads to inaccurate final term weighting. This paper puts forward a Principle Component Analysis based Term Weights Normalizing Method, which is able to remove the redundant overlapped information and come up with a more accurate final term weight.
机译:术语加权是自然语言处理中文档形式化的重要一步。它极大地干扰了自然语言处理系统的准确性。术语权重包括三个部分:全局术语权重,本地术语权重和标准化因子。已经提出了许多术语权重算法来解决每个部分。当前,最终术语权重是多个术语权重算法的乘积。但是,不同的术语权重算法的结果相互关联,这表明它们之间存在冗余的重叠信息。简单地将结果相乘会导致最终术语加权不准确。提出了一种基于主成分分析的术语权重归一化方法,该方法能够去除多余的重叠信息,并给出更准确的最终术语权重。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号