首页> 外文会议>International Conference on Innovations in Computer Science and Engineering >Text Clustering and Text Summarization on the Use of Side Information
【24h】

Text Clustering and Text Summarization on the Use of Side Information

机译:侧面信息使用的文本聚类和文本摘要

获取原文

摘要

Clustering algorithm order information focuses on persuading social events concentrated around their similarity to abuse important data from data focuses. The end place of clustering these properties (text) has huge measure of information. It is difficult to measure relative data in light of the way in which the rate of the information is not clear. In such cases, it can be risky to partner side-data into the mining technique, since it can either build the nature of the representation for the mining system, then again add noise to the methodology. In various content mining applications, side-information is accessible nearby the content reports. Such text documents may be of a few sorts, for instance, record provenance information, the connections in the file, user access conduct from web logs, or other non-text based characteristics which are embedded into the content record. Such qualities may contain a massive measure of data for clustering purposes in the proposed system merge summarization methods. While executing the COATES estimation we used summarization system which is the union of duplicated clusters what's more, give last summary. COATES cluster algorithms we get the clusters on the establishment of substance what's more, auxiliary attributes. So in this project, an algorithm is designed, in order to give an effective clustering algorithm. Two algorithms are used in this project for clustering. In this paper COATES algorithm (this algorithm combines classical partitioning algorithms with probabilistic models) is used and the proposed system implements hierarchical algorithm which is compared with COATES algorithm and also implements the merging and summary generation algorithm which produces the summary or pure data for the user's convenience.
机译:聚类算法订单信息侧重于说服集中的社交事件,以滥用数据来自数据集中的重要数据。这些属性(文本)的结束地点具有巨大信息。鉴于信息速率尚不清楚的方式,难以测量相对数据。在这种情况下,将侧面数据融为挖掘技术可能是风险的,因为它可以构建挖掘系统的表示的性质,然后再次向方法添加噪声。在各种内容挖掘应用程序中,内容报告中可以访问侧面信息。这种文本文档可以是几种类型的,例如,记录出处信息,文件中的连接,来自Web日志的用户访问行为,或者嵌入到内容记录中的其他非文本的基于特征。这些品质可能包含在所提出的系统合并摘要方法中含有用于聚类目的的大量数据。在执行Coate估计的同时,我们使用的摘要系统是重复群集的结合,更重要的是,给上一个摘要。 COATES CLUSTAL算法我们在建立物质的情况下获得群集更多,辅助属性。因此,在该项目中,设计了一种算法,以便提供有效的聚类算法。在该项目中使用两种算法进行聚类。在本文中,使用了使用概率模型的经典分区算法(该算法与概率型号相结合),并且所提出的系统实现了与Coate算法进行比较的分层算法,并且还实现了为用户产生摘要或纯数据的合并和摘要生成算法方便。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号