首页> 外国专利> PHRASE BASED DOCUMENT CLUSTERING WITH AUTOMATIC PHRASE EXTRACTION

PHRASE BASED DOCUMENT CLUSTERING WITH AUTOMATIC PHRASE EXTRACTION

机译：基于短语的自动短语抽取文档聚类

页面导航

摘要
著录项
相似文献

摘要

Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.

机译：通过分析大量文档并使用统计量度（例如互信息量度）将有意义的短语与偶然出现的单词组区别开来，从统计学上将有意义的短语与机会词序列区分开。在一些实施例中，保留候选短语的多个列表以优化短语识别算法的存储要求。短语识别后，可以使用单词和有意义的短语的组合来构建文档簇。

著录项

公开/公告号US2013185060A1

专利类型
公开/公告日2013-07-18

原文格式PDF
申请/专利权人 STRATIFY INC.;
展开▼

申请/专利号US201313784187
发明设计人 JOY THOMAS;KARTHIK RAMACHANDRAN;
展开▼

申请日2013-03-04
分类号G06F17/27;
国家 US
入库时间 2022-08-21 16:51:25

相似文献

专利
外文文献
中文文献