首页> 外文会议>2016 International Conference on Computing, Analytics and Security Trends >An innovative approach to classify and retrieve text documents using feature extraction and Hierarchical clustering based on ontology
【24h】

An innovative approach to classify and retrieve text documents using feature extraction and Hierarchical clustering based on ontology

机译:利用特征提取和基于本体的层次聚类对文本文档进行分类和检索的创新方法

获取原文
获取原文并翻译 | 示例

摘要

Data retrieval is a key process of acquiring information as per requirement. The necessity of proper information has increased. The most basic tools which provide this service are browser. It traverses the data as per user's query and gives the search results of all related information. Hence, it becomes a time consuming process to find required information. In this paper, the focus is done on content based data mining using ontology and text feature extraction. Content based data mining process focuses on domain of the data. Ontology, itself is a domain based data set information system that will help to achieve required data retrieval in a more appropriate way. The proposed system uses k means clustering algorithm for creation of flat clusters. Flat clusters are the primary classification or clusters of data that are used for Hierarchical clustering. For the proposed system Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA) is used. This technique of clustering is very fast and gives more accurate results. For more appropriate data retrieval, this system uses text feature extraction algorithm. This algorithm will help to reduce the noisy data from data sets. A noise free data will help to perform better data retrieval process. Implemented system works over various types of text file such as PDF, .txt, DOC, DOCX. This system is also compatible with other types of files like WebPages, images etc.
机译:数据检索是根据要求获取信息的关键过程。提供适当信息的必要性增加了。提供此服务的最基本工具是浏览器。它根据用户查询遍历数据,并提供所有相关信息的搜索结果。因此,查找所需信息成为一个耗时的过程。在本文中,重点是使用本体和文本特征提取的基于内容的数据挖掘。基于内容的数据挖掘过程着重于数据域。本体本身是一个基于域的数据集信息系统,它将有助于以更适当的方式实现所需的数据检索。所提出的系统使用k均值聚类算法创建扁平聚类。平面集群是用于层次集群的主要分类或数据集群。对于所提出的系统,使用了基于层次模糊关系特征向量中心性的聚类算法(HFRECCA)。这种聚类技术非常快,并且可以提供更准确的结果。为了更适当地进行数据检索,该系统使用文本特征提取算法。该算法将有助于减少数据集中的噪声数据。无噪声的数据将有助于执行更好的数据检索过程。已实现的系统可处理各种类型的文本文件,例如PDF,.txt,DOC,DOCX。该系统还与其他类型的文件(如网页,图像等)兼容。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号