首页> 外文期刊>International journal of parallel programming >Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction
【24h】

Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction

机译:通过新颖的K-Mean非负矩阵分解(KNMF)算法使用关键短语提取的魅力文档聚类

获取原文
获取原文并翻译 | 示例

摘要

The tedious challenging of Big Data is to store and retrieve of required data from the search engines. Problem Defined There is an obligation for the quick and efficient retrieval of useful information for the many organizations. The elementary idea is to arrange these computing files of organization into individual folders in an hierarchical order of folders. Manually, to order these files into folders, there is an ardent need to know about the file contents and name of the files to give impression of files, so that it provides an alignment of certain set of files as a bunch. Problem Statement Manual grouping of files has its own complications, for example when these files are in numerous amounts and also their contents cannot be illustrious by their labels. Therefore, it's an intense requirement for Document clustering with data processing machines for enthusiastic results. Existing System A couple of analyzers are impending with dynamic algorithms and comprehensive analogy of extant algorithms, but, yet, these have been restricted to organizations and colleges. After recent updated rules of NMF their raised a self interest in document clustering. These rules gave trust in its performances with better results when compared to Latent Semantic Indexing with Singular Value Decomposition. Proposed System A new working miniature called Novel K-means Non-Negative Matrix Factorization (KNMF) is implemented using renovated guidelines of NMF which has been diagnosed for clustering documents consequently. A new data set called Newsgroup20 is considered for the exploratory purpose. Removal of common clutter/stop words using keywords from Key Phrase Extraction Algorithm and a new proposed Iterated Lovin stemming will be utilized in preprocessing step inassisting to KNMF. Compared to the Porter stemmer and Lovins stemmer algorithms, Iterative Lovins algorithm is providing 5% more reduction. 60% of the document terms are been minimized to root as remaining terms are already root words. Eventually, an appeal to these processes named "Progressive Text mining radical" is developed inlateral exertion of K-Means algorithm from the defined Apache Mahout Project which is used to analyze the performance of the MapReduce framework in Hadoop.
机译:大数据的繁琐挑战是存储和检索来自搜索引擎的所需数据。解决问题有义务为许多组织的有用信息的快速有效地检索有义务。基本思想是以文件夹的分层顺序排列组织的这些计算文件。手动,要将这些文件命令到文件夹中,有一个ardent需要了解文件的内容和文件的名称,以给出文件的印象,使其提供某些文件集的对齐。问题声明手动分组文件具有自己的复杂情况,例如,当这些文件有许多数量时,它们的内容也无法展示他们的标签。因此,它对与数据处理机器进行热情效果的文档聚类是一个强烈的要求。现有系统几个分析仪是即将发生的动态算法和现存算法的全面类比,但然而,这些都被限制在组织和学院。在最近更新的NMF规则后,他们对文档聚类提出了自我利益。与具有奇异值分解的潜在语义索引相比,这些规则具有更好的结果,使其具有更好的结果。提出的系统使用已被诊断为聚类文件的NMF的翻新指南来实现称为新型K-Mean非负矩阵分解(KNMF)的新的工作微型。为探索性目的考虑一个名为新闻组20的新数据集。使用来自关键短语提取算法的关键字的常见杂物/停止单词和新的提议的迭代Lovin茎将用于预处理步骤局部局部到KNMF。与Porter Sefemer和Lovins相比,迭代LovINS算法提供5%的减少。 60%的文档项被最小化,因为剩余的术语已经是根词。最终,向这些进程命名为“渐进式文本挖掘自由基”的吸引力是从定义的Apache Mahout项目中开发了K-Means算法的,用于分析Hadoop中MapReduce Framework的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号