Text classification using a hidden Markov model.

机译：使用隐马尔可夫模型进行文本分类。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text categorization (TC) is the task of automatically categorizing textual digital documents into pre-set categories by analyzing their contents. The purpose of this study is to develop an effective TC model to resolve the difficulty of automatic classification. In this study, two primary goals are intended. First, a Hidden Markov Model (HAM is proposed as a relatively new method for text categorization. HMM has been applied to a wide range of applications in text processing such as text segmentation and event tracking, information retrieval, and information extraction. Few, however, have applied HMM to TC. Second, the Library of Congress Classification (LCC) is adopted as a classification scheme for the HMM-based TC model for categorizing digital documents. LCC has been used only in a handful of experiments for the purpose of automatic classification. In the proposed framework, a general prototype for an HMM-based TC model is designed, and an experimental model based on the prototype is implemented so as to categorize digitalized documents into LCC. A sample of abstracts from the ProQuest Digital Dissertations database is used for the test-base. Dissertation abstracts, which are pre-classified by professional librarians, form an ideal test-base for evaluating the proposed model of automatic TC. For comparative purposes, a Naive Bayesian model, which has been extensively used in TC applications, is also implemented. Our experimental results show that the performance of our model surpasses that of the Naive Bayesian model as measured by comparing the automatic classification of abstracts to the manual classification performed by professionals.

机译：文本分类（TC）是通过分析文本数字文档的内容来自动将文本数字文档分类为预设类别的任务。这项研究的目的是开发一种有效的TC模型，以解决自动分类的难题。在这项研究中，有两个主要目标。首先，提出了一种隐马尔可夫模型（HAM）作为一种相对较新的文本分类方法。HMM已被广泛应用于文本处理中，例如文本分段和事件跟踪，信息检索和信息提取等，但是很少。，已将HMM应用于TC；其次，国会图书馆分类（LCC）被用作基于HMM的TC模型的分类方案，用于对数字文档进行分类； LCC仅在少数实验中用于自动在提出的框架中，设计了基于HMM的TC模型的通用原型，并实现了基于该原型的实验模型，以将数字化文档分类为LCC，并从ProQuest Digital Dissertations数据库中提取了摘要样本。由专业图书馆员预先分类的学位论文摘要构成了评估自动机模型的理想测试基础ic TC。为了进行比较，还实现了已在TC应用中广泛使用的朴素贝叶斯模型。我们的实验结果表明，通过将摘要的自动分类与专业人员进行的手动分类进行比较，我们的模型的性能优于朴素贝叶斯模型。

著录项

作者
Yi, Kwan.;
展开▼
作者单位

McGill University (Canada).;

展开▼
授予单位 McGill University (Canada).;
学科 Information Science.;Artificial Intelligence.
学位 Ph.D.
年度 2005
页码 182 p.
总页数 182
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Opinion mining using ensemble text hidden Markov models for text classification [J] . Kang Mangi, Ahn Jaelim, Lee Kichun Expert Systems with Application . 2018,第MARa期

机译：使用集成文本隐藏马尔可夫模型进行文本分类的观点挖掘
2. A new-arabic-text classification system using a Hidden Markov Model [J] . Zied Kechaou, Slim Kanoun International journal of knowledge-based and intelligent engineering systems . 2014,第4期

机译：使用隐马尔可夫模型的新阿拉伯文本分类系统
3. A hidden Markov model- based text classification of medical documents [J] . Kwan Yi, Jamshid Beheshti Journal of Information Science . 2009,第1期

机译：基于隐马尔可夫模型的医学文献文本分类
4. Dance learning and recognition system based on hidden Markov model. a case study : aceh traditional dance [C] . Anbarsanti Nurfitri, Prihatmanto Ary S. IEEE International Conference on System Engineering and Technology . 2014

机译：基于隐马尔可夫模型的舞蹈学习与识别系统。案例研究：aceh传统舞蹈
5. Fight deck human-automation mode confusion detection using a generalized fuzzy hidden Markov model. [D] . Lyu, Hao Lyu. 2016

机译：使用广义模糊隐马尔可夫模型的战斗甲板人员自动化模式混淆检测。
6. Combining Text Classification and Hidden Markov Modeling Techniques for Structuring Randomized Clinical Trial Abstracts [O] . Rong Xu, Kaustubh Supekar, Yang Huang, 2006

机译：结合文本分类和隐马尔可夫建模技术构建随机临床试验摘要
7. Unsupervised classification of radar images using hidden Markov chains and hidden Markov random fields [O] . Fjortoft Roger, Pieczynski Wojciech, Sigelle Marc, 2003

机译：使用隐马尔可夫链和隐马尔可夫随机场对雷达图像进行无监督分类

Text classification using a hidden Markov model.

摘要

著录项

相似文献

相关主题

期刊订阅