首页> 外国专利> METHOD AND DEVICE FOR AUTOMATIC ANNOTATION OF CONTENT OF ELECTRONIC DOCUMENTS

METHOD AND DEVICE FOR AUTOMATIC ANNOTATION OF CONTENT OF ELECTRONIC DOCUMENTS

机译:电子文档内容自动标注的方法和装置

摘要

1. A method of annotating an electronic document, comprising the steps of: splitting an electronic document into a plurality of members, each of the plurality of members being associated with a corresponding length, a corresponding information score and a matching consistency score; a subset of the plurality of members is automatically selected so that the total the informativeness score of this subset is maximum, while the total length of this subset is less than or equal to the maximum length; and include the said subset as an annotation to the electronic document. 2. The method of claim 1, wherein said subset contains less than all of the plurality of members. The method of claim 1, wherein at least one of the members is a proposal. The method of claim 1, wherein the corresponding information score for a given member of the plurality of members is assigned in accordance with a scoring technique that is language independent. The method of claim 4, wherein the scoring technique assigns weights to a plurality of attributes of a given member in accordance with a set of manually programmable rules. The method of claim 1, wherein the corresponding information score for a given member of the plurality of members is assigned in accordance with a scoring technique that is language dependent. The method of claim 6, wherein the scoring technique is a controlled machine learning technique that uses a statistical classifier. The method of claim 7, wherein the statistical classifier corresponds to the support vector method. �
机译:1。一种对电子文档进行注释的方法,包括以下步骤:将电子文档分成多个成员,所述多个成员中的每个与相应的长度,相应的信息得分和匹配的一致性得分相关联;自动选择多个成员的一个子集,以使该子集的总信息量得分最大,而该子集的总长度小于或等于最大长度;并且将所述子集包括为电子文档的注释。 2.如权利要求1所述的方法,其特征在于,所述子集包含少于全部多个成员。 2.根据权利要求1所述的方法,其中,所述成员中的至少一个是提议。 2.根据权利要求1所述的方法,其中,根据与语言无关的评分技术来分配所述多个成员中的给定成员的对应信息得分。 5.根据权利要求4所述的方法,其中,所述计分技术根据一组手动可编程规则将权重分配给给定成员的多个属性。 2.根据权利要求1所述的方法,其中,根据与语言相关的评分技术来分配所述多个成员中的给定成员的对应信息分数。 7.根据权利要求6所述的方法,其中,所述评分技术是使用统计分类器的受控机器学习技术。 8.根据权利要求7所述的方法,其中,所述统计分类器对应于所述支持向量方法。 �

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号