首页> 中文期刊> 《软件学报》 >基于扩展角分类神经网络的文档分类方法(英文)

基于扩展角分类神经网络的文档分类方法(英文)

         

摘要

CC4神经网络是一种三层前馈网络的新型角分类(corner classification)训练算法,原用于元搜索引擎Anvish的文档分类.当各文档之间的规模接近时,CC4神经网络有较好的分类效果.然而当文档之间规模差别较大时,其分类性能较差.针对这一问题,本文意图扩展原始CC4神经网络,达到对文档有效分类的效果.为此,提出了一种基于MDS-NN的数据索引方法,将每一文档映射至k维空间数据点,并尽可能多地保持原始文档之间的距离信息.其次,通过将索引信息变换为CC4神经网络接受的0,1序列,实现对CC4神经网络的扩展,使其能够接受索引信息作为输入.实验结果表明对相互之间规模差别较大的文档,扩展CC4神经网络的性能优于原始CC4神经网络的性能.同时,扩展CC4神经网络的分类精度与文档索引方法有密切关系.%CC4 (the 4th version of corner classification) neural network is a new type of corner classification training algorithm for three-layered feedforward neural networks. It has been provided as a document classification approach for metasearch engine Anvish. On the condition that documents are almost of the same size, CC4 neural network is an effective document classification algorithm. However, when there is great difference in document sizes, CC4 neural network does not perform well. This paper aims to extend the original CC4 neural network for effectively classifying documents having much difference in sizes. To achieve this goal, the authors propose a MDS-NN based data indexing method thus making all documents be mapped to k-dimensional points while their distance information is kept well. The authors also extend CC4 neural network so that it can accept k-dimensional indexes of documents as its input, then transform these indexes to binary sequences required by CC4 neural network. The experimental results show that the performance of ExtendedCC4 is much better than that of InitialCC4 when there is a great difference in document sizes. At the same time, the high classification precision of ExtendedCC4 has much relationship with the effectiveness of indexing methods.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号