首页> 外文会议>International Russian automation conference >Automated Text Classification System Based on Statistical Unified Model
【24h】

Automated Text Classification System Based on Statistical Unified Model

机译:基于统计统一模型的自动文本分类系统

获取原文

摘要

The paper is devoted to the automated text classification system based on a unified model. The main text mining statistical and linguistic approaches are considered. The architecture of the developed system is given. The automated system is analyzed in details. The text data unified model consists of statistical elementary models: substrings, cumulative and the finite difference characteristics. The structure and features of each model are considered. The unified model flexibility is achieved by assigning weights for each elementary model. According to this logic, the system can be modified for different text types. The automated system for literary text classification has been tested. The classification quality was evaluated by precision, recall and f-measure. The model was trained and evaluated on the 6 classes. The total texts number in training set and test set is 600 and 60 respectively. The automated text classification system shows good results, low scores for some texts are explained. The advantages and limitations of the proposed system are shown. In addition, there is research area on the linguistic models inclusion in order to improve the classification quality of the proposed automated system.
机译:本文基于统一模型致力于自动文本分类系统。考虑了主要的挖掘统计和语言方法。给出了开发系统的架构。自动化系统详细分析。文本数据统一模型包括统计基础模型:子串,累积和有限差分特性。考虑每个模型的结构和特征。通过为每个基本模型分配权重来实现统一的模型灵活性。根据此逻辑,可以为不同的文本类型修改系统。已经测试了文学文本分类的自动化系统。通过精确,召回和F测量来评估分类质量。该模型在6级培训和评估。培训集和测试集中的总文本分别为600和60。自动文本分类系统显示出良好的结果,解释了一些文本的低分。显示了所提出的系统的优点和限制。此外,在语言模型上存在研究区,以提高所提出的自动化系统的分类质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号