【24h】

A Novel Text Feature Weight Calculation Method Applied to Power Field

机译:一种新的文本特征权重计算方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Feature extraction is the important prerequisite of classifying text effectively and automatically. The TF-IDF algorithm is widely used to express the text feature weight. But it can't reflect the dispersion information of category, and then can't reflect the difference between categories. TF-IDF works poorly in the power field, because the focus point and expression of news texts vary a great deal in different sub-fields. Accordingly, the paper proposes a novel algorithm for text feature weight calculation applied to power field, called TF-DFDP algorithm. The TF-DFDP algorithm introduces FC (Frequency in Category), DC (Dispersion in Category), PS (Paragraph Span Factor) and CW (Category Weight Factor). Experimental results demonstrate the new algorithm performance with respect to higher precision, elevated recall and better F1 value.
机译:特征提取是有效且自动地对文本进行分类的重要前提。 TF-IDF算法被广泛用于表达文本特征权重。但是它不能反映类别的离散信息,也不能反映类别之间的差异。 TF-IDF在电源领域的效果很差,因为新闻文本的焦点和表达方式在不同的子领域中差异很大。因此,本文提出了一种应用于电力领域的文本特征权重计算新算法,称为TF-DFDP算法。 TF-DFDP算法引入了FC(类别频率),DC(类别色散),PS(段落跨度因子)和CW(类别权重因子)。实验结果证明了新算法在更高的精度,更高的查全率和更好的F1值方面的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号