首页> 外文会议>International Conference on Theory and Practice of Digital Libraries >Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts Under Precision and Recall Constraints
【24h】

Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts Under Precision and Recall Constraints

机译:基于内容的基于内容的高精度索引的基于内容的质量估计和召回约束

获取原文

摘要

Digital libraries strive for integration of automatic subject indexing methods into operative information retrieval systems, yet integration is prevented by misleading and incomplete semantic annotations. For this reason, we investigate approaches to detect documents where quality criteria are met. In contrast to mainstream methods, our approach, named Qualle, estimates quality at the document-level rather than the concept-level. Qualle is implemented as a combination of different machine learning models into a deep, multi-layered regression architecture that comprises a variety of content-based indicators, in particular label set size calibration. We evaluated the approach on very short texts from law and economics, investigating the impact of different feature groups on recall estimation. Our results show that Qualle effectively determined subsets of previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Such filtering can therefore be used to control compliance with data quality standards in practice. Qualle allows to make trade-offs between indexing quality and collection coverage, and it can complement semi-automatic indexing to process large datasets more efficiently.
机译:数字图书馆努力将自动主题索引方法集成到操作信息检索系统中,但通过误导性和不完整的语义注释来防止集成。因此,我们调查检测满足质量标准的文献的方法。与主流方法相比,我们的方法,命名为准,估计文档级别而不是概念级别的质量。 PERPLE被实施为不同的机器学习模型的组合,进入深度多层回归架构,该架构包括各种基于内容的指标,特别是标签集大小校准。我们在法律和经济学中评估了对非常简短的文本的方法,调查不同特征群体对召回估计的影响。我们的结果表明,Qualle有效地确定了以前看不见的数据的子集,其中可以实现文档级召回的相当大的收益,同时同时采用精度。因此,这种滤波可用于控制遵守数据质量标准的实践。 PERGLE允许在索引质量和集合覆盖范围之间进行权衡,并且可以补充半自动索引来更有效地处理大型数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号