首页> 外文OA文献 >Text Mining Untuk Pencarian Dokumen Bahasa Inggris Menggunakan Suffix Tree Clustering
【2h】

Text Mining Untuk Pencarian Dokumen Bahasa Inggris Menggunakan Suffix Tree Clustering

机译:利用后缀树聚类搜索英文文档的文本挖掘

摘要

A search of the collection of documents generally provide excerpts of the documents are arranged according to rank matches in a long list. Not infrequently a search result in tens and even hundreds of fragments of documents that caused a user to scroll the screen up and down (scrolling) to examine the documents snippet one by one. This situation causes a user is having difficulty in determining which documents relevant to the topic he wants. In this Final Project developed an application web-based document segmentation with suffix tree clustering method. The basic concept of this method is to classify documents in the search results to form groups or clusters based on words or phrases contained in these documents. The application requires the search input and output will result in clusters containing the corresponding documents. This cluster can be stratified depending on the word or phrase that might be distinguished on the same parent cluster. Clusters generated is displayed to the user. Then on the last cluster is selected will display a collection of documents, each consisting of the title and snippet of the document. With this method expected results would be easier to trace. Keywords : text mining, suffix tree, suffix tree clustering, the grouping of documents.
机译:对文档集合的搜索通常提供了文档的摘录,这些摘录是根据长列表中的排名匹配来排列的。搜索通常不会产生数十甚至数百个文档碎片,这些文档碎片导致用户上下滚动(滚动)屏幕以逐一检查文档片段。这种情况导致用户难以确定与他想要的主题相关的文档。在此最终项目中,开发了一种基于应用程序的基于网络的文档后缀和后缀树聚类方法。此方法的基本概念是根据搜索结果中包含的单词或短语将搜索结果中的文档分类为组或簇。该应用程序要求搜索输入和输出将导致包含相应文档的簇。可以根据在同一父群集上可能区分出的单词或短语来对该群集进行分层。生成的集群将显示给用户。然后,在最后一个群集上选择将显示一个文档集合,每个文档由文档的标题和摘要组成。使用这种方法,预期结果将更易于跟踪。关键词:文本挖掘,后缀树,后缀树聚类,文档分组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号