...
首页> 外文期刊>Library hi tech >Enhancing document modeling by means of open topic models Crossing the frontier of classification schemesin digital libraries by example of the DDC
【24h】

Enhancing document modeling by means of open topic models Crossing the frontier of classification schemesin digital libraries by example of the DDC

机译:通过开放主题模型来增强文档建模以DDC为例跨越数字图书馆中的分类方案

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose – The purpose of this paper is to present a topic classification model using the Dewey DecimalClassification (DDC) as the target scheme. This is to be done by exploring metadata as provided by theOpen Archives Initiative (OAI) to derive document snippets as minimal document representations. Thereason is to reduce the effort of document processing in digital libraries. Further, the paper seeks toperform feature selection and extension by means of social ontologies and related web-based lexicalresources. This is done to provide reliable topic-related classifications while circumventing the problemof data sparseness. Finally, the paper aims to evaluate the model by means of two language-specificcorpora. The paper bridges digital libraries, on the one hand, and computational linguistics, on the other.The aim is to make accessible computational linguistic methods to provide thematic classifications indigital libraries based on closed topic models such as the DDC. Design/methodology/approach – The approach takes the form of text classification,text-technology, computational linguistics, computational semantics, and social semantics. Findings .-is shown that SVM-based classifiers perform best by exploring certain selections ofOAI document metadata. Research limitations/implications – The findings show that it is necessary to further developSVM-based DDC-classifiers by using larger training sets possibly for more than two languages inorder to get better F-measure values. Originality/value – Algorithmic and formal-mathematical information is provided on how to buildDDC-classifiers for digital libraries.
机译:目的–本文的目的是提供一个以Dewey DecimalClassification(DDC)作为目标方案的主题分类模型。这是通过探索开放档案馆倡议(OAI)提供的元数据来完成的,以导出文档片段作为最小的文档表示形式。这样做是为了减少数字图书馆中文档处理的工作量。此外,本文试图通过社交本体和相关的基于网络的词汇资源来进行特征选择和扩展。这样做是为了提供可靠的与主题相关的分类,同时避免了数据稀疏的问题。最后,本文旨在通过两个特定于语言的语料库对模型进行评估。本文一方面将数字图书馆与计算语言学架起了桥梁,其目的是使可访问的计算语言学方法能够基于封闭主题模型(例如DDC)提供数字图书馆的主题分类。设计/方法/方法–该方法采取文本分类,文本技术,计算语言学,计算语义和社会语义的形式。结果显示,通过探索OAI文档元数据的某些选择,基于SVM的分类器表现最佳。研究的局限性/意义–研究结果表明,有必要通过使用可能用于两种以上语言的更大训练集来进一步开发基于SVM的DDC分类器,以便获得更好的F测量值。原创性/价值–提供有关如何为数字图书馆构建DDC分类器的算法和形式数学信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号