首页> 外国专利> TEXT INFORMATION CLUSTERING METHOD AND TEXT INFORMATION CLUSTERING SYSTEM

TEXT INFORMATION CLUSTERING METHOD AND TEXT INFORMATION CLUSTERING SYSTEM

机译:文本信息聚类方法和文本信息聚类系统

摘要

A text information clustering method and system. The clustering method comprises the following steps: performing word segmentation on each of multiple pieces of text information, so as to form multiple words (S101); performing initial clustering on the multiple pieces of text information on which word segmentation has been performed, so as to form multiple first-level subjects, each first-level subject comprising at least two pieces of text information (S102); determining the number of second-level subjects under each first-level subject according to the number of pieces of text information under each first-level subject (S103); and performing secondary clustering on at least two pieces of text information comprised in each first-level subject according to the number of second-level subjects under each first-level subject, so as to form multiple second-level subjects (S104). By using the layered clustering method, the total number of first-level subjects is decreased in initial clustering, thereby accelerating the computing efficiency; in secondary clustering, the number of second-level subjects is dynamically determined according to the number of pieces of text information, thereby accelerating the computing speed of the second-level subjects.
机译:文本信息聚类方法和系统。聚类方法包括以下步骤:对多条文本信息中的每条进行词分割,以形成多个词(S101);对已经进行了词分割的多条文本信息进行初始聚类,以形成多个第一级主题,每个第一级主题包括至少两个文本信息(S102);根据每个第一级主题下的文本信息的数量,确定每个第一级主题下的第二级主题的数量(S103);根据每个第一级主题下的第二级主题的数量,对每个第一级主题中包含的至少两条文本信息进行二次聚类,以形成多个第二级主题(S104)。通过分层聚类的方法,减少了初始聚类中一级主体的总数,从而提高了计算效率;在二级聚类中,根据文本信息的数量动态确定二级主题的数量,从而加快了二级主题的计算速度。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号