A multi-layer text classification framework based on two-level representation model

Jiali Yun; Liping Jing; Jian Yu; Houkuan Huang

首页> 外文期刊>Expert Systems with Application >A multi-layer text classification framework based on two-level representation model

【24h】

A multi-layer text classification framework based on two-level representation model

机译：基于两级表示模型的多层文本分类框架

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more difficult to be analyzed because it contains complicated both syntactic and semantic information. In this paper, we propose a two-level representation model (2RM) to represent text data, one is for representing syntactic information and the other is for semantic information. Each document, in syntactic level, is represented as a term vector where the value of each component is the term frequency and inverse document frequency. The Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. Meanwhile, we designed a multi-layer classification framework (MLCLA) to make use of the semantic and syntactic information represented in 2RM model. The MLCLA framework contains three classifiers. Among them, two classifiers are applied on syntactic level and semantic level in parallel. The outputs of these two classifiers will be combined and input to the third classifier, so that the final results can be obtained. Experimental results on benchmark data sets (20Newsgroups, Reuters-21578 and Classic3) have shown that the proposed 2RM model plus MLCLA framework improves the text classification performance by comparing with the existing fiat text representation models (Term-based VSM, Term Semantic Kernel Model, Concept-based VSM, Concept Semantic Kernel Model and Term + Concept VSM) plus existing classification methods.

机译：文本分类是数据挖掘和机器学习领域中最常见的主题之一。与结构化数据不同，非结构化文本数据更难以分析，因为它包含复杂的句法和语义信息。在本文中，我们提出了一种用于表示文本数据的两级表示模型（2RM），一种用于表示语法信息，另一种用于语义信息。在语法层面上，每个文档都表示为术语向量，其中每个分量的值是术语频率和文档反向频率。与句法层面的术语相关的Wikipedia概念用于表示语义层面的文档。同时，我们设计了一个多层分类框架（MLCLA），以利用2RM模型中表示的语义和句法信息。 MLCLA框架包含三个分类器。其中，两个分类器在句法层面和语义层面并行应用。这两个分类器的输出将被合并并输入到第三个分类器中，从而可以获得最终结果。在基准数据集（20Newsgroups，Reuters-21578和Classic3）上的实验结果表明，与现有的法定文本表示模型（基于术语的VSM，术语语义内核模型，基于概念的VSM，概念语义内核模型和术语+概念VSM）以及现有的分类方法。

著录项

来源
《Expert Systems with Application》 |2012年第2期|p.2035-2046|共12页
作者
Jiali Yun; Liping Jing; Jian Yu; Houkuan Huang;
展开▼
作者单位

School of Computer and Information Technology, Beijing Jiaotong University, China;

School of Computer and Information Technology, Beijing Jiaotong University, China;

School of Computer and Information Technology, Beijing Jiaotong University, China;

School of Computer and Information Technology, Beijing Jiaotong University, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
text classification; text representation; multi-layer classification; wikipedia; semantics;

机译：文字分类文字表示;多层分类;维基百科;语义学;

相似文献

外文文献
中文文献
专利

1. Modeling and forecasting the electricity clearing price: A novel BELM based pattern classification framework and a comparative analytic study on multi-layer BELM and LSTM [J] . Shao Zhen, Zheng Qingru, Yang Shanlin, Energy economics . 2020,第Feba期

机译：电力结算价格的建模和预测：基于BELM的新型模式分类框架以及多层BELM和LSTM的比较分析研究
2. Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques [J] . Abdisa Demissie Amensisa New Media and Mass Communication . 2020,第4期

机译：基于新型融合技术的基于增强句子矢量空间模型和双革文本表示模型的文本文档分类
3. Long Text Classification Algorithm Using a Hybrid Model of Bidirectional Encoder Representation from Transformers-Hierarchical Attention Networks-Dilated Convolutions Network [J] . ZHAO Yuanyuan, GAO Shining, LIU Yang, 东华大学学报（英文版） . 2021,第004期

机译：使用变压器 - 分层关注网络扩展卷轴网络的双向编码器表示的混合模型的长文本分类算法
4. Semantics-Based Representation Model for Multi-layer Text Classification [C] . Jiali Yun, Liping Jing, Jian Yu, International conference on knowledge-based and intelligent information and engineering systems;KES 2010 . 2010

机译：基于语义的多层文本分类表示模型
5. A new model for molecular representation and classification: Formal approach based on the ETS framework. [D] . Korkin, Dmitry. 2003

机译：分子表示和分类的新模型：基于ETS框架的形式化方法。
6. A Topic-modeling Based Framework for Drug-drug Interaction Classification from Biomedical Text [O] . Dingcheng Li, Sijia Liu, Majid Rastegar-Mojarad, 2016

机译：基于主题模型的生物医学文献中药物相互作用分类框架
7. N-grams based feature selection and text representation for Chinese Text Classification [O] . Zhihua Wei, Duoqian Miao, Jean-Hugues Chauchat, 2009

机译：基于N-GRAMS的特征选择和文本分类的文本表示

A multi-layer text classification framework based on two-level representation model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅