首页> 美国政府科技报告 >Some Issues in the Automatic Classification of U.S. Patents Working Notes for the AAAI-98 Workshop on Learning for Text Categorization
【24h】

Some Issues in the Automatic Classification of U.S. Patents Working Notes for the AAAI-98 Workshop on Learning for Text Categorization

机译:美国专利自动分类中的一些问题aaaI-98文本分类学习研讨会工作说明

获取原文

摘要

The classification of U.S. patents poses some special problems due to the enormous size of the corpus, the size and complex hierarchical structure of the classification system, and the size and structure of patent documents. The representation of the complex structure of documents has not received a great deal of previous attention, but we have found it to be an important factor in our work. We are exploring ways to use this structure and the hierarchical relations among patent subclasses to facilitate the classification of patents. Our approach is to derive a vector of terms and phrases from the most important parts of the patent to represent each document. We use both k-nearest-neighbor classifiers and Bayesian classifiers. The k-nearest-neighbor classifier allows us to represent the document structure using the query operators in the Inquery information retrieval system. The Bayesian classifiers can use the hierarchical relations among patent subclasses to select closely related negative examples to train more discriminating classifiers.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号