海南发射场的海量文档缺乏有效的组织和管理,急需开发一种快速、有效的方法以实现文档的自动整理、归类。针对这一实际需求,基于Web技术和文本分类技术,通过Web服务器构建、训练文本数据收集、文本预处理、文本特征表示和分类模型训练,研发了一套服务器—客户端模式的航天文本分类系统。经测试,该分类系统在测试数据集上的准确率和召回率均达到90%以上,具有良好的分类性能。%Hainan Launch Center has no effective organization and management of massive documents, which is urgent for establishingan effective method to automatically categorize documents. To solve this problem, a server-client model text classiifcation system has been implemented based on text classiifcation technology and Web technology, through Web construction, text training data preprocessing, feature vectorizing and classiifcation training. Testing results suggest that this classiifcation system has good performances, and both the precision and recall of the data testing are above 90%.
展开▼