首页>
外国专利>
PHRASE BASED DOCUMENT CLUSTERING WITH AUTOMATIC PHRASE EXTRACTION
PHRASE BASED DOCUMENT CLUSTERING WITH AUTOMATIC PHRASE EXTRACTION
展开▼
机译:基于短语的自动短语抽取文档聚类
展开▼
页面导航
摘要
著录项
相似文献
摘要
Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.
展开▼