首页> 外文会议>Perspectives of System Informatics >Multi-classification of Patent Applications with Winnow
【24h】

Multi-classification of Patent Applications with Winnow

机译:Winnow对专利申请的多分类

获取原文

摘要

The Winnow family of learning algorithms can cope well with large numbers of features and is tolerant to variations in document length, which makes it suitable for classifying large collections of large documents, like patent applications. Both the large size of the documents and the large number of available training documents for each class make this classification task qualitatively different from the classification of short documents (newspaper articles or medical abstracts) with few training examples, as exemplified by the TREC evaluations. This note describes recent experiments with Winnow on two large corpora of patent applications, supplied by the European Patent Office (EPO). It is found that the multi-classification of patent applications is much less accurate than the mono-classification of similar documents. We describe a potential pitfall in multi-classification and show ways to improve the accuracy. We argue that the inherently larger noisiness of multi-class labeling is the reason that multi-classification is harder than mono-classification.
机译:Winnow系列学习算法可以很好地应对大量功能,并且可以容忍文档长度的变化,这使其适用于对大型文档的大集合进行分类,例如专利申请。每个班级的大量文档和大量可用的培训文档都使该分类任务与简短文档(报纸文章或医学摘要)的分类在质量上有所不同,而培训文档很少,例如TREC评估所示。本说明描述了由Winnow在欧洲专利局(EPO)提供的两个大型专利申请上进行的最新实验。发现专利申请的多分类比相似文献的单分类准确度低得多。我们描述了多重分类中的潜在陷阱,并显示了提高准确性的方法。我们认为,多类别标签固有的较大噪音是多类别比单类别更难的原因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号