首页> 外国专利> A SYSTEM AND METHOD FOR DECISION DRIVEN HYBRID TEXT CLUSTERING

A SYSTEM AND METHOD FOR DECISION DRIVEN HYBRID TEXT CLUSTERING

机译:一种决策驱动的混合文本聚类系统与方法

摘要

The present invention discloses a method and a system for clustering of short and long text documents. The documents are input through an input module and a pre-processing module overtakes the documents from the input module. The pre-processing module refines the documents and removes unwanted text from the documents. Then a decision driven hybrid text clustering algorithm is applied via different modules to achieve clustering of the documents. Firstly, a context module computes a moment value of a feature signifying at least one feature importance value of the feature for the documents. The moment value is used by a decision module to calculate a decision score. Basis the decision score the documents are split into two sets. A clustering module then forms clusters of the two sets of documents basis n-tuple word distribution. Finally, a convergence module congregates the clusters in a final set of documents.
机译:本发明公开了一种用于短文本和长文本文档聚类的方法和系统。文件通过输入模块输入,预处理模块取代输入模块中的文件。预处理模块细化文档并从文档中删除不需要的文本。然后,通过不同的模块应用决策驱动的混合文本聚类算法来实现文档的聚类。首先,上下文模块计算表示文档的特征的至少一个特征重要性值的特征的矩值。决策模块使用矩值计算决策得分。根据决策得分,将文档分为两组。然后,聚类模块根据n元组词分布形成两组文档的聚类。最后,聚合模块将集群聚集在最终的文档集中。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号