首页> 外文学位 >Learning Material Classification
【24h】

Learning Material Classification

机译:学习资料分类

获取原文
获取原文并翻译 | 示例

摘要

Exponential growth rates of learning materials and rapid distribution of those resources among e-learners via Internet have made it nearly infeasible to manually review each document and categorize it. In addition, the ability to classify objects into groups is of high importance in many applications like text retrieval, query search and learning recommender systems. This emerging need has re-emphasized the importance of automatic text classification systems in enabling us to classify semi-structured and unstructured text documents into predefined labeled groups. Specifically, if we can categorize learning resources based on their content (and not just based on subject topic or user declaration), this can help recommender systems and search engines to automatically build repositories of relevant documents and present to the users the most relevant ones based on query and/or user preferences.;Text classification poses many challenges for learning systems which now deal with huge numbers of texts of highly variable length, structure and content. The representing features should be carefully selected to capture the important semantics of text to yield an acceptable classification performance while keeping the computational cost within a practically reasonable range. The features must also be useful across a wide range of class definitions.;In this study, we investigate the applicability of text mining algorithms for categorizing different text-based educational resources into curriculum defined learning objectives. To do this, a variant of Term Frequency Inverse Document Frequency (TF-IDF) feature selection method along with a majority-voting-based classification system comprised of five different classical classifiers will be utilized. Three different knowledge domains with 65 learning objectives in total will be used in the experiments to evaluate the performance of the system. We will also study the effects of varying the number of features per each label on system performance.;To deal with the rapid dimensionality rise of the feature vector as the system is being extended (which introduces more computational burden and tends to limit the capacity of the system to scale up), we will propose a hierarchical multitier classification architecture that can outperform single-layer single-node classification system in terms of computational cost and scalability. A simple version of this scheme will be implemented and analyzed. We will experimentally show that this architecture needs less number of features per label in comparison to the single node classification system. Despite other advantages like easier scalability, and lower maintenance cost, this multi-layer architecture could suffer from higher initial setup cost.
机译:学习资料的指数级增长以及通过互联网在电子学习者之间快速分配这些资源使得手动审查每个文档并将其分类几乎是不可能的。另外,在许多应用中,例如文本检索,查询搜索和学习推荐系统,将对象分类的能力非常重要。这种新出现的需求再次强调了自动文本分类系统在使我们能够将半结构化和非结构化文本文档分类为预定义标签组中的重要性。具体来说,如果我们可以根据学习资源的内容(而不是仅根据主题或用户声明)对学习资源进行分类,则可以帮助推荐系统和搜索引擎自动构建相关文档的存储库,并根据相关内容向用户展示最相关的文档。文本分类给学习系统带来了许多挑战,这些学习系统现在要处理大量长度,结构和内容高度可变的文本。应该仔细选择表示特征,以捕获文本的重要语义,以产生可接受的分类性能,同时将计算成本保持在实际合理的范围内。这些功能还必须在广泛的班级定义中有用。;在本研究中,我们研究了文本挖掘算法用于将不同的基于文本的教育资源分类为课程定义的学习目标的适用性。为此,将使用术语频率逆文档频率(TF-IDF)特征选择方法的变体,以及由五个不同的经典分类器组成的基于多数投票的分类系统。实验中将使用总共具有65个学习目标的三个不同的知识领域来评估系统的性能。我们还将研究改变每个标签的特征数量对系统性能的影响。为了应对系统扩展时特征向量的快速维数上升(这会引入更多的计算负担,并往往会限制系统的容量)。系统的扩展性),我们将提出一种分层的多层分类体系结构,该体系结构在计算成本和可伸缩性方面可以胜过单层单节点分类系统。将实施和分析此方案的简单版本。我们将通过实验证明,与单节点分类系统相比,该体系结构每个标签所需的特征数量更少。尽管具有其他优点(如更容易扩展)和较低的维护成本,但这种多层体系结构可能会遭受较高的初始设置成本。

著录项

  • 作者单位

    Illinois State University.;

  • 授予单位 Illinois State University.;
  • 学科 Information science.;Artificial intelligence.;Information technology.
  • 学位 M.S.
  • 年度 2017
  • 页码 63 p.
  • 总页数 63
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号