首页> 外国专利> Method for categorizing documents by multilevel feature selection and hierarchical clustering based on parts of speech tagging

Method for categorizing documents by multilevel feature selection and hierarchical clustering based on parts of speech tagging

机译:基于语音标记部分的多特征选择和层次聚类的文档分类方法

摘要

A method for categorizing documents is disclosed. The words composing the documents are tagged according to their parts of speech. A first group of features is selected corresponding to one of the parts of speech. The documents are grouped into clusters according to their semantic affinity to the first set of features and to each other. The clusters are refined into a hierarchy of progressively refined clusters, the features of which are selected based on corresponding parts of speech.
机译:公开了一种用于对文档进行分类的方法。构成文档的单词会根据其词性进行标记。选择对应于语音部分之一的第一组特征。根据文档与第一组功能以及彼此之间的语义相似性,将文档分为几类。将聚类精炼成逐步精炼聚类的层次结构,并根据语音的相应部分选择其特征。

著录项

  • 公开/公告号US2003236659A1

    专利类型

  • 公开/公告日2003-12-25

    原文格式PDF

  • 申请/专利权人 CASTELLANOS MALU;

    申请/专利号US20020177892

  • 发明设计人 MALU CASTELLANOS;

    申请日2002-06-20

  • 分类号G06F17/28;

  • 国家 US

  • 入库时间 2022-08-21 23:15:05

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号