搜索引擎结果聚类对提高搜索引擎服务质量和智能化水平有较高的应用价值,它是从标题和文档片段的有限信息中寻找文档相关度进行聚类.传统搜索引擎聚类没有充分利用搜索引擎结果的自身特点,或者计算复杂度较高.本文提出了一种基于主题词匹配频数的搜索引擎聚类算法,该算法从高频词中筛选出主题词,根据主题词共现情况自动产生类别,其他结果依据满足与类别主题词表的匹配频数的文档数进行聚类.实验结果与STC和IANGO算法相比,搜索质量有所提高.%Search engine results clustering has a high application value to the search service quality and the intelligence level, which clusters by seeking the document relations from the title and the document segment information. An algorithm based on topic words matching frequency is proposed. It automatically generate categories according to the graph of the semantic relevance, with other results clustering by topic words matching frequency. Experiments show that, compared with the STC and LINGO algorithms, the algorithm performs better.
展开▼