首页> 外文期刊>Information Processing & Management >Cluster-based patent retrieval
【24h】

Cluster-based patent retrieval

机译:基于集群的专利检索

获取原文
获取原文并翻译 | 示例
       

摘要

Through the recent NTCIR workshops, patent retrieval casts many challenging issues to information retrieval community. Unlike newspaper articles, patent documents are very long and well structured. These characteristics raise the necessity to reassess existing retrieval techniques that have been mainly developed for structure-less and short documents such as newspapers. This study investigates cluster-based retrieval in the context of invalidity search task of patent retrieval. Cluster-based retrieval assumes that clusters would provide additional evidence to match user's information need. Thus far, cluster-based retrieval approaches have relied on automatically-created clusters. Fortunately, all patents have manually-assigned cluster information, international patent classification codes. International patent classification is a standard taxonomy for classifying patents, and has currently about 69,000 nodes which are organized into a five-level hierarchical system. Thus, patent documents could provide the best test bed to develop and evaluate cluster-based retrieval techniques. Experiments using the NTCIR-4 patent collection showed that the cluster-based language model could be helpful to improving the cluster-less baseline language model.
机译:通过最近的NTCIR研讨会,专利检索给信息检索界带来了许多具有挑战性的问题。与报纸文章不同,专利文件非常长且结构合理。这些特征提出了重新评估主要针对无结构和简短文档(例如报纸)开发的现有检索技术的必要性。本研究在专利检索的无效检索任务的背景下研究了基于聚类的检索。基于聚类的检索假定聚类将提供其他证据来匹配用户的信息需求。到目前为止,基于集群的检索方法已经依赖于自动创建的集群。幸运的是,所有专利都具有手动分配的簇信息,国际专利分类代码。国际专利分类是用于对专利进行分类的标准分类法,目前有约69,000个节点被组织为五级层次结构系统。因此,专利文献可以提供最佳的试验台,以开发和评估基于簇的检索技术。使用NTCIR-4专利集进行的实验表明,基于聚类的语言模型可能有助于改进无聚类的基线语言模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号