首页> 外文学位 >Creating a criterion-based information agent through data mining for automated identification of scholarly research on the World Wide Web.

【24h】

Creating a criterion-based information agent through data mining for automated identification of scholarly research on the World Wide Web.

机译：通过数据挖掘创建基于标准的信息代理，以自动识别万维网上的学术研究。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation creates an information agent that correctly identifies Web pages containing scholarly research approximately 96% of the time. It does this by analyzing the Web page with a set of criteria, and then uses a classification tree to arrive at a decision.;The criteria were gathered from the literature on selecting print and electronic materials for academic libraries. A Delphi study was done with an international panel of librarians to expand and refine the criteria until a list of 41 operationalizable criteria was agreed upon. A Perl program was then designed to analyze a Web page and determine a numerical value for each criterion.;A large collection of Web pages was gathered comprising 5,000 pages that contain the full work of scholarly research and 5,000 random pages, representative of user searches, that do not contain scholarly research. Datasets were built by running the Perl program on these Web pages. The datasets were split into model building and testing sets.;Data mining was then used to create different classification models. Four techniques were used: logistic regression, non-parametric discriminant analysis, classification trees, and neural networks. The models were created with the model datasets and then tested against the test dataset. Precision and recall were used to judge the effectiveness of each model. In addition, a set of pages that were difficult to classify because of their similarity to scholarly research was gathered and classified with the models.;The classification tree created the most effective classification model, with a precision of 96% and a recall of 95.6%. However, logistic regression created a model that was able to correctly classify more of the problematic pages.;This agent can be used to create a database of scholarly research published on the Web. In addition, the technique can be used to create a database of any type of structured electronic information.

机译：本文创建了一个信息代理，可以正确识别大约96％的时间包含学术研究的网页。它通过使用一组标准分析网页来完成此任务，然后使用分类树来做出决定。这些标准是从文献中收集的，这些文献是关于为大学图书馆选择印刷和电子材料的。与国际图书管理员小组进行了德尔菲研究，以扩展和完善标准，直到商定了41个可操作标准的列表。然后设计了一个Perl程序来分析网页并确定每个标准的数值。收集了大量网页，其中包括5,000个页面，其中包含学术研究的全部内容； 5,000个随机页面，代表用户搜索，不包含学术研究。通过在这些网页上运行Perl程序来构建数据集。将数据集分为模型建立和测试集。然后使用数据挖掘来创建不同的分类模型。使用了四种技术：逻辑回归，非参数判别分析，分类树和神经网络。使用模型数据集创建模型，然后针对测试数据集进行测试。精确度和召回率用于判断每个模型的有效性。此外，还收集了由于与学术研究相似而难以分类的一组页面，并使用这些模型对其进行了分类。分类树创建了最有效的分类模型，准确度为96％，召回率为95.6％。但是，逻辑回归创建了一个模型，该模型能够正确分类更多有问题的页面。该代理可用于创建网络上发布的学术研究数据库。另外，该技术可用于创建任何类型的结构化电子信息的数据库。

著录项

作者
Nicholson, Scott Richard.;
展开▼
作者单位

University of North Texas.;

展开▼
授予单位 University of North Texas.;
学科 Mathematics.;Information Science.;Computer Science.
学位 Ph.D.
年度 2000
页码 100 p.
总页数 100
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. SOFTWARE AGENTS PARADIGM IN AUTOMATED DATA MINING FOR BETTER VISUALIZATION USING INTELLIGENT AGENTS [J] . R. JAYABRABU, Dr. V. SARAVANAN, Prof. K. VIVEKANANDAN Journal of Theoretical and Applied Information Technology . 2012,第2期

机译：使用智能代理在自动数据挖掘中实现更好可视化的软件代理范式
2. A Semi‐automated Approach to Create Purposeful Mechanistic Datasets from Heterogeneous Data: Data Mining Towards the in silico in silico Predictions for Oestrogen Receptor Modulation and Teratogenicity [J] . Bashir?Surfraz M., Fowkes Adrian, Plante Jeffrey P. Molecular informatics . 2017,第8期

机译：从异质数据创建有目的地机械数据集的半自动方法：雌激素预测中的硅化的数据挖掘雌激素受体调节和致畸性
3. A Framework: Cluster Detection and Multidimensional Visualization of Automated Data Mining Using Intelligent Agents [J] . R. Jayabrabu, V. Saravanan, K. Vivekanandan International Journal of Artificial Intelligence & Applications (IJAIA) . 2012,第1期

机译：框架：使用智能代理的自动数据挖掘的群集检测和多维可视化
4. DESIGN AND IMPLEMENTATION OF AUTOMATED DATA MINING USING INTELLIGENT AGENTS IN OBJECT ORIENTED DATABASES [C] . V. Saravanan, K. Vivekanandan, International Federation for Information Processing(IFIP) IFIP TC12/WG12.3 International Conference on Intelligent Information Processing . 2005

机译：在面向对象数据库中使用智能代理的自动数据挖掘的设计与实现
5. Knowledge discovery in databases of Web use: Data mining for informetric and behavioral models of information seeking on the World Wide Web. [D] . Turnbull, Donald R. 2002

机译：Web使用数据库中的知识发现：数据挖掘，用于在Internet上搜索信息的信息和行为模型。
6. Investigating Pathogenic and Hepatocarcinogenic Mechanisms from Normal Liver to HCC by Constructing Genetic and Epigenetic Networks via Big Genetic and Epigenetic Data Mining and Genome-Wide NGS Data Identification [O] . Cheng-Wei Li, Yu-Kai Chiu, Bor-Sen Chen 2018

机译：通过大基因和表观遗传数据挖掘和全基因组NGS数据识别构建遗传和表观遗传网络研究从正常肝脏到HCC的致病和肝癌发生机制
7. Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining to Discover Web-Based Scholarly Research Works [O] . Nicholson Scott 2003

机译：数字图书馆环境中自动馆藏开发的书目：使用数据挖掘发现基于Web的学术研究成果

Creating a criterion-based information agent through data mining for automated identification of scholarly research on the World Wide Web.

摘要

著录项

相似文献

相关主题

期刊订阅