An innovative approach to classify and retrieve text documents using feature extraction and Hierarchical clustering based on ontology

机译：利用特征提取和基于本体的层次聚类对文本文档进行分类和检索的创新方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data retrieval is a key process of acquiring information as per requirement. The necessity of proper information has increased. The most basic tools which provide this service are browser. It traverses the data as per user's query and gives the search results of all related information. Hence, it becomes a time consuming process to find required information. In this paper, the focus is done on content based data mining using ontology and text feature extraction. Content based data mining process focuses on domain of the data. Ontology, itself is a domain based data set information system that will help to achieve required data retrieval in a more appropriate way. The proposed system uses k means clustering algorithm for creation of flat clusters. Flat clusters are the primary classification or clusters of data that are used for Hierarchical clustering. For the proposed system Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA) is used. This technique of clustering is very fast and gives more accurate results. For more appropriate data retrieval, this system uses text feature extraction algorithm. This algorithm will help to reduce the noisy data from data sets. A noise free data will help to perform better data retrieval process. Implemented system works over various types of text file such as PDF, .txt, DOC, DOCX. This system is also compatible with other types of files like WebPages, images etc.

机译：数据检索是根据要求获取信息的关键过程。提供适当信息的必要性增加了。提供此服务的最基本工具是浏览器。它根据用户查询遍历数据，并提供所有相关信息的搜索结果。因此，查找所需信息成为一个耗时的过程。在本文中，重点是使用本体和文本特征提取的基于内容的数据挖掘。基于内容的数据挖掘过程着重于数据域。本体本身是一个基于域的数据集信息系统，它将有助于以更适当的方式实现所需的数据检索。所提出的系统使用k均值聚类算法创建扁平聚类。平面集群是用于层次集群的主要分类或数据集群。对于所提出的系统，使用了基于层次模糊关系特征向量中心性的聚类算法（HFRECCA）。这种聚类技术非常快，并且可以提供更准确的结果。为了更适当地进行数据检索，该系统使用文本特征提取算法。该算法将有助于减少数据集中的噪声数据。无噪声的数据将有助于执行更好的数据检索过程。已实现的系统可处理各种类型的文本文件，例如PDF，.txt，DOC，DOCX。该系统还与其他类型的文件（如网页，图像等）兼容。

著录项

来源
《2016 International Conference on Computing, Analytics and Security Trends》|2016年|371-376|共6页
会议地点 Pune(IN)
作者
Aradhana R. Patil; Amrita A. Manjrekar;
展开▼
作者单位

Computer science and technology department, Department of technology, Kolhapur, India;

Computer science and technology department, Department of technology, Kolhapur, India;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature extraction; Ontologies; Clustering algorithms; Data mining; Partitioning algorithms; Algorithm design and analysis; Browsers;

机译：特征提取;本体;聚类算法;数据挖掘;分区算法;算法设计与分析;浏览器;

相似文献

外文文献
中文文献
专利

1. A Novel Approach for Ontology- Based Feature Vector Generation for Web Text Document Classification [J] . Mohamed K. Elhadad, Khaled M. Badran, Gouda I. Salama International journal of software innovation . 2018,第1期

机译：基于本体的特征向量的Web文本文档分类新方法
2. AN APPROACH BASED ON ITERATIVE LEARNING ALGORITHM FOR CHINESE TEXT HIERARCHY FEATURE EXTRACTION WITHOUT LEXICON [J] . SHAOHUA JIANG Journal of Theoretical and Applied Information Technology . 2013,第1期

机译：基于迭代学习算法的无词法中文文本层次结构特征提取方法
3. Ontology Based Text Document Clustering for Sports [J] . A. Sudha Ramkumar, B. Poorna, B. Saleena Journal of Engineering & Applied Sciences . 2018,第11期

机译：基于本体的文本文档集群进行体育
4. An innovative approach to classify and retrieve text documents using feature extraction and Hierarchical clustering based on ontology [C] . Aradhana R. Patil, Amrita A. Manjrekar International Conference on Computing, Analytics and Security Trends . 2016

机译：使用基于本体的特征提取和分层群集来分类和检索文本文档的创新方法
5. An ontology-driven concept-based information retrieveal approach for Web documents. [D] . Li, Zhan. 2010

机译：基于本体的基于概念的Web文档信息检索方法。
6. Thematic clustering of text documents using an EM-based approach [O] . Sun Kim, W John Wilbur 2012

机译：使用基于EM的方法对文本文档进行主题聚类
7. DOCUMENT CLUSTERING USING AGGLOMERATIVE HIERARCHICAL CLUSTERING APPROACH (AHDC) AND PROPOSED TSG KEYWORD EXTRACTION METHOD [O] . R. Nagarajan . 2016

机译：使用聚焦分层聚类方法（AHDC）和提出的TSG关键字提取方法的文档聚类
8. Ontology-Based Information Extraction from Free-Form Text [R] . Braun, R. 2000

机译：基于本体的自由格式文本信息抽取

An innovative approach to classify and retrieve text documents using feature extraction and Hierarchical clustering based on ontology

摘要

著录项

相似文献

相关主题

期刊订阅