首页> 外文学位 >Information retrieval: A framework for recommending text-based classification algorithms.

【24h】

Information retrieval: A framework for recommending text-based classification algorithms.

机译：信息检索：一种推荐基于文本的分类算法的框架。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classification is one of the central issues in information retrieval systems dealing with text data. The need for effective approaches has been dramatically increased due to the advent of the World Wide Web and massive digital libraries. Effective methods are invaluable for the exploration of information repositories with the aim to discover similarities between groups of text-based documents.; One goal of this thesis is the development of tools for supporting users of machine learning and data mining algorithms in the area of text classification. While the interest in such technology is growing rapidly, tools are still limited to end-users who are not experts. This is due to the fact that machine learning systems are difficult to design and their number keeps increasing. As a result, system designers are faced with two major research problems: algorithmic model selection and model combination, i.e., (a) selecting the most suitable model/algorithm to use on a given application, and (b) integrating this with useful and effective transformations of the data. Traditionally, these problems are resolved by trial-and-error or through consultation of experts. The first solution is time consuming and unreliable. The second solution is expensive and biased by the expert's own prejudices and preferences. This thesis develops a meta-model framework system called the Regression Model Framework (RMF) that supports system designers with model selection and method combination. RMF uses statistical regression analysis to combine prior meta-knowledge with meta-level learning.; The second major goal of this thesis is to investigate how text classification is performed on the Web. A great deal of text-based documents are available on the Internet and in corporate intranets, and categorizing them into useful semantic categories is a rewarding and challenging research problem. However, current approaches to text categorization on the Web mostly concentrate on simple representation schemes that are based on word occurrence and word frequency. The structural information that is inherent to documents on the Web is usually neglected. In analyzing Web documents, the relative importance of hypertext tags is investigated in order to ascertain their relative importance in predicting the relevance of unknown documents.

机译：分类是处理文本数据的信息检索系统的中心问题之一。由于万维网和海量数字图书馆的出现，对有效方法的需求已大大增加。有效的方法对于探索信息库，以发现基于文本的文档组之间的相似性是无价的。本文的目标之一是在文本分类领域开发支持机器学习和数据挖掘算法用户的工具。尽管人们对这种技术的兴趣迅速增长，但工具仍然仅限于非专家的最终用户。这是由于以下事实：机器学习系统难以设计，并且其数量还在不断增加。结果，系统设计人员面临两个主要的研究问题：算法模型选择和模型组合，即（a）选择最适合在给定应用程序上使用的模型/算法，以及（b）将其与有用和有效的集成在一起数据转换。传统上，这些问题是通过反复试验或通过专家咨询来解决的。第一种解决方案是耗时且不可靠的。第二种解决方案价格昂贵，并因专家自身的偏见和偏爱而有偏差。本文开发了一种称为回归模型框架（RMF）的元模型框架系统，该系统通过模型选择和方法组合为系统设计人员提供支持。 RMF使用统计回归分析将先前的元知识与元级学习相结合。本文的第二个主要目标是研究如何在Web上执行文本分类。 Internet和公司Intranet上都有大量基于文本的文档，将它们分类为有用的语义类别是一个有意义且具有挑战性的研究问题。但是，当前Web上的文本分类方法主要集中在基于单词出现和单词频率的简单表示方案上。 Web文档固有的结构信息通常被忽略。在分析Web文档时，研究了超文本标签的相对重要性，以便确定它们在预测未知文档的相关性方面的相对重要性。

著录项

作者
Saleeb, Hany.;
展开▼
作者单位

Pace University.;

展开▼
授予单位 Pace University.;
学科 Computer Science.; Information Science.
学位 D.P.S.
年度 2002
页码 233 p.
总页数 233
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;信息与知识传播;
关键词

相似文献

外文文献
中文文献
专利

1. Supervised framework for automatic recognition and retrieval of interaction: a framework for classification and retrieving videos with similar human interactions [J] . C. Chattopadhyay, S. Das Computer Vision, IET . 2016,第3期

机译：受监管的自动识别和检索交互的框架：用于分类和检索具有类似人类交互作用的视频的框架
2. Modeling User Preferences in Recommender Systems: A Classification Framework for Explicit and Implicit User Feedback [J] . GAWESH JAWAHEER, PETER WELLER, PATTY KOSTKOVA ACM Transactions on Interactive Intelligent Systems . 2014,第2期

机译：推荐系统中的用户首选项建模：显式和隐式用户反馈的分类框架
3. Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework [J] . Ghodratnama Samaneh, Moghaddam Hamid Abrishami Pattern Analysis and Applications . 2021,第1期

机译：基于内容的图像检索使用特征加权和C-means群集在多标签分类框架中
4. A Conceptual Framework for Automatic Text-Based Indexing and Retrieval in Digital Video Collections [C] . Mohammed Belkhatir, Mbarek Charhad International Conference on Database and Expert Systems Applications . 2007

机译：基于自动文本的索引和数字视频集中检索的概念框架
5. Bayesian frameworks for deformable pattern classification and retrieval: Application to handwriting recognition. [D] . Cheung, Kwok-Wai. 1999

机译：用于可变形模式分类和检索的贝叶斯框架：在手写识别中的应用。
6. An evaluation of computer assisted clinical classification algorithms. [O] . C. G. Chute, Y. Yang, J. Buntrock 1994

机译：对计算机辅助临床分类算法的评估。
7. Text-based Hierarchical Image Classification and Retrieval of Stock Photography [O] . Anna Bjarnestam 1998

机译：基于文本的分层图像分类和股票摄影检索

Information retrieval: A framework for recommending text-based classification algorithms.

摘要

著录项

相似文献

相关主题

期刊订阅