首页> 外文会议>ASIST annual meeting >Building Topic Models in a Federated Digital Library Through Selective Document Exclusion
【24h】

Building Topic Models in a Federated Digital Library Through Selective Document Exclusion

机译:通过选择性文档排除在联合数字图书馆中构建主题模型

获取原文

摘要

Building topic models in federated digital collectionspresents numerous challenges due to metadatainconsistencies. The quality of topical metadata is difficultto ascertain and is interspersed with often irrelevantadministrative metadata. In this study, we propose a way toimprove topic modeling in large collections by identifyingdocuments that convey only weak topical information.These documents are ignored when training topic models.Their topical associations are instead inferred modeltraining. A method is outlined for identifying weaklytopical documents by defining runs of similar documents ina collection. In preliminary evaluation using a corpus fromthe Institute of Museum and Library Services DigitalCollections and Content aggregation, results show anincrease in coherence among words in topics. In showingthis, we demonstrate that it may be beneficial to inducetopic models using less, higher-quality data.
机译:由于元数据不一致,在联邦数字馆藏中建立主题模型提出了许多挑战。主题元数据的质量很难确定,并且经常与无关的管理元数据相互关联。在这项研究中,我们提出了一种方法,通过识别仅传达薄弱主题信息的文档来改善大型馆藏中的主题建模。训练主题模型时会忽略这些文档,而是将其主题关联推论为模型训练。概述了一种通过定义集合中相似文档的运行来识别弱主题文档的方法。在使用博物馆和图书馆服务研究所DigitalCollections和内容聚合的语料库进行的初步评估中,结果显示主题词之间的连贯性增强。通过展示这一点,我们证明使用较少,较高质量的数据来诱导主题模型可能是有益的。

著录项

  • 来源
    《ASIST annual meeting》|2011年|1-10|共10页
  • 会议地点 New Orleans LA(US)
  • 作者单位

    Graduate School of Library and Information Science University of Illinois Urbana-Champaign 501 E. Daniel St. Champaign IL 61820 mefron@illinois.edu;

    Graduate School of Library and Information Science University of Illinois Urbana-Champaign 501 E. Daniel St. Champaign IL 61820 organis2@illinois.edu;

    Graduate School of Library and Information Science University of Illinois Urbana-Champaign 501 E. Daniel St. Champaign IL 61820 kfenlon2@illinois.edu;

  • 会议组织
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

    Digital libraries; latent topic models; documentrepresentation;

    机译:数字图书馆; ;潜在主题模型; ;文件表示;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号