首页> 外文会议>International Conference on Computing, Analytics and Security Trends >An innovative approach to classify and retrieve text documents using feature extraction and Hierarchical clustering based on ontology
【24h】

An innovative approach to classify and retrieve text documents using feature extraction and Hierarchical clustering based on ontology

机译:使用基于本体的特征提取和分层群集来分类和检索文本文档的创新方法

获取原文

摘要

Data retrieval is a key process of acquiring information as per requirement. The necessity of proper information has increased. The most basic tools which provide this service are browser. It traverses the data as per user's query and gives the search results of all related information. Hence, it becomes a time consuming process to find required information. In this paper, the focus is done on content based data mining using ontology and text feature extraction. Content based data mining process focuses on domain of the data. Ontology, itself is a domain based data set information system that will help to achieve required data retrieval in a more appropriate way. The proposed system uses k means clustering algorithm for creation of flat clusters. Flat clusters are the primary classification or clusters of data that are used for Hierarchical clustering. For the proposed system Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA) is used. This technique of clustering is very fast and gives more accurate results. For more appropriate data retrieval, this system uses text feature extraction algorithm. This algorithm will help to reduce the noisy data from data sets. A noise free data will help to perform better data retrieval process. Implemented system works over various types of text file such as PDF, .txt, DOC, DOCX. This system is also compatible with other types of files like WebPages, images etc.
机译:数据检索是根据要求获取信息的关键过程。适当信息的必要性增加。提供此服务的最基本的工具是浏览器。它根据用户的查询遍历数据,并提供所有相关信息的搜索结果。因此,找到所需信息成为耗时的过程。在本文中,使用本体和文本特征提取对基于内容的数据挖掘进行了重点。基于内容的数据挖掘过程侧重于数据的域。本身是基于域的数据集信息系统,有助于以更合适的方式实现所需的数据检索。所提出的系统使用K表示扁平簇的创建聚类算法。扁平群集是用于分层聚类的主要分类或数据集群。对于所提出的系统分层模糊关系,使用基于中心的聚类算法(HFRECCA)。这种聚类技术非常快,并提供更准确的结果。对于更合适的数据检索,该系统使用文本特征提取算法。该算法将有助于从数据集中减少嘈杂数据。无噪声数据将有助于执行更好的数据检索过程。实现的系统适用于各种类型的文本文件,如PDF,.txt,doc,docx。该系统也与其他类型的文件兼容,如网页,图像等。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号