An innovative approach to classify and retrieve text documents using feature extraction and Hierarchical clustering based on ontology

机译：使用基于本体的特征提取和分层群集来分类和检索文本文档的创新方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data retrieval is a key process of acquiring information as per requirement. The necessity of proper information has increased. The most basic tools which provide this service are browser. It traverses the data as per user's query and gives the search results of all related information. Hence, it becomes a time consuming process to find required information. In this paper, the focus is done on content based data mining using ontology and text feature extraction. Content based data mining process focuses on domain of the data. Ontology, itself is a domain based data set information system that will help to achieve required data retrieval in a more appropriate way. The proposed system uses k means clustering algorithm for creation of flat clusters. Flat clusters are the primary classification or clusters of data that are used for Hierarchical clustering. For the proposed system Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA) is used. This technique of clustering is very fast and gives more accurate results. For more appropriate data retrieval, this system uses text feature extraction algorithm. This algorithm will help to reduce the noisy data from data sets. A noise free data will help to perform better data retrieval process. Implemented system works over various types of text file such as PDF, .txt, DOC, DOCX. This system is also compatible with other types of files like WebPages, images etc.

机译：数据检索是根据要求获取信息的关键过程。适当信息的必要性增加。提供此服务的最基本的工具是浏览器。它根据用户的查询遍历数据，并提供所有相关信息的搜索结果。因此，找到所需信息成为耗时的过程。在本文中，使用本体和文本特征提取对基于内容的数据挖掘进行了重点。基于内容的数据挖掘过程侧重于数据的域。本身是基于域的数据集信息系统，有助于以更合适的方式实现所需的数据检索。所提出的系统使用K表示扁平簇的创建聚类算法。扁平群集是用于分层聚类的主要分类或数据集群。对于所提出的系统分层模糊关系，使用基于中心的聚类算法（HFRECCA）。这种聚类技术非常快，并提供更准确的结果。对于更合适的数据检索，该系统使用文本特征提取算法。该算法将有助于从数据集中减少嘈杂数据。无噪声数据将有助于执行更好的数据检索过程。实现的系统适用于各种类型的文本文件，如PDF，.txt，doc，docx。该系统也与其他类型的文件兼容，如网页，图像等。

著录项

来源
《International Conference on Computing, Analytics and Security Trends》|2016年|1 v.|共6页
会议地点
作者
Aradhana R. Patil; Amrita A. Manjrekar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Feature extraction; Ontologies; Clustering algorithms; Data mining; Partitioning algorithms; Algorithm design and analysis; Browsers;

机译：特征提取;本体;聚类算法;数据挖掘;分区算法;算法设计和分析;浏览器;

相似文献

外文文献
中文文献
专利

1. A Novel Approach for Ontology- Based Feature Vector Generation for Web Text Document Classification [J] . Mohamed K. Elhadad, Khaled M. Badran, Gouda I. Salama International journal of software innovation . 2018,第1期

机译：基于本体的特征向量的Web文本文档分类新方法
2. AN APPROACH BASED ON ITERATIVE LEARNING ALGORITHM FOR CHINESE TEXT HIERARCHY FEATURE EXTRACTION WITHOUT LEXICON [J] . SHAOHUA JIANG Journal of Theoretical and Applied Information Technology . 2013,第1期

机译：基于迭代学习算法的无词法中文文本层次结构特征提取方法
3. Ontology Based Text Document Clustering for Sports [J] . A. Sudha Ramkumar, B. Poorna, B. Saleena Journal of Engineering & Applied Sciences . 2018,第11期

机译：基于本体的文本文档集群进行体育
4. An innovative approach to classify and retrieve text documents using feature extraction and Hierarchical clustering based on ontology [C] . Aradhana R. Patil, Amrita A. Manjrekar 2016 International Conference on Computing, Analytics and Security Trends . 2016

机译：利用特征提取和基于本体的层次聚类对文本文档进行分类和检索的创新方法
5. An ontology-driven concept-based information retrieveal approach for Web documents. [D] . Li, Zhan. 2010

机译：基于本体的基于概念的Web文档信息检索方法。
6. Thematic clustering of text documents using an EM-based approach [O] . Sun Kim, W John Wilbur 2012

机译：使用基于EM的方法对文本文档进行主题聚类
7. DOCUMENT CLUSTERING USING AGGLOMERATIVE HIERARCHICAL CLUSTERING APPROACH (AHDC) AND PROPOSED TSG KEYWORD EXTRACTION METHOD [O] . R. Nagarajan . 2016

机译：使用聚焦分层聚类方法（AHDC）和提出的TSG关键字提取方法的文档聚类
8. Ontology-Based Information Extraction from Free-Form Text [R] . Braun, R. 2000

机译：基于本体的自由格式文本信息抽取

An innovative approach to classify and retrieve text documents using feature extraction and Hierarchical clustering based on ontology

摘要

著录项

相似文献

相关主题

期刊订阅