Enhancing document modeling by means of open topic models Crossing the frontier of classification schemesin digital libraries by example of the DDC

Alexander Mehler; Ulli Waltinger

首页> 外文期刊>Library hi tech >Enhancing document modeling by means of open topic models Crossing the frontier of classification schemesin digital libraries by example of the DDC

【24h】

Enhancing document modeling by means of open topic models Crossing the frontier of classification schemesin digital libraries by example of the DDC

机译：通过开放主题模型来增强文档建模以DDC为例跨越数字图书馆中的分类方案

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Purpose – The purpose of this paper is to present a topic classification model using the Dewey DecimalClassification (DDC) as the target scheme. This is to be done by exploring metadata as provided by theOpen Archives Initiative (OAI) to derive document snippets as minimal document representations. Thereason is to reduce the effort of document processing in digital libraries. Further, the paper seeks toperform feature selection and extension by means of social ontologies and related web-based lexicalresources. This is done to provide reliable topic-related classifications while circumventing the problemof data sparseness. Finally, the paper aims to evaluate the model by means of two language-specificcorpora. The paper bridges digital libraries, on the one hand, and computational linguistics, on the other.The aim is to make accessible computational linguistic methods to provide thematic classifications indigital libraries based on closed topic models such as the DDC. Design/methodology/approach – The approach takes the form of text classification,text-technology, computational linguistics, computational semantics, and social semantics. Findings .-is shown that SVM-based classifiers perform best by exploring certain selections ofOAI document metadata. Research limitations/implications – The findings show that it is necessary to further developSVM-based DDC-classifiers by using larger training sets possibly for more than two languages inorder to get better F-measure values. Originality/value – Algorithmic and formal-mathematical information is provided on how to buildDDC-classifiers for digital libraries.

机译：目的–本文的目的是提供一个以Dewey DecimalClassification（DDC）作为目标方案的主题分类模型。这是通过探索开放档案馆倡议（OAI）提供的元数据来完成的，以导出文档片段作为最小的文档表示形式。这样做是为了减少数字图书馆中文档处理的工作量。此外，本文试图通过社交本体和相关的基于网络的词汇资源来进行特征选择和扩展。这样做是为了提供可靠的与主题相关的分类，同时避免了数据稀疏的问题。最后，本文旨在通过两个特定于语言的语料库对模型进行评估。本文一方面将数字图书馆与计算语言学架起了桥梁，其目的是使可访问的计算语言学方法能够基于封闭主题模型（例如DDC）提供数字图书馆的主题分类。设计/方法/方法–该方法采取文本分类，文本技术，计算语言学，计算语义和社会语义的形式。结果显示，通过探索OAI文档元数据的某些选择，基于SVM的分类器表现最佳。研究的局限性/意义–研究结果表明，有必要通过使用可能用于两种以上语言的更大训练集来进一步开发基于SVM的DDC分类器，以便获得更好的F测量值。原创性/价值–提供有关如何为数字图书馆构建DDC分类器的算法和形式数学信息。

著录项

来源
《Library hi tech》 |2009年第4期|共20页
作者
Alexander Mehler; Ulli Waltinger;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆管理;
关键词
Document management; Modelling; Digital libraries;

机译：文件管理;建模;数字图书馆;

相似文献

外文文献
中文文献
专利

1. Enhancing document modeling by means of open topic models Crossing the frontier of classification schemesin digital libraries by example of the DDC [J] . Alexander Mehler, Ulli Waltinger Library hi tech . 2009,第4期

机译：通过开放主题模型来增强文档建模以DDC为例跨越数字图书馆中的分类方案
2. Web document classification using topic modeling based document ranking [J] . Youngseok Lee, Jungwon Cho International Journal of Electrical and Computer Engineering . 2021,第3期

机译：使用基于主题建模的文档排名进行Web文档分类
3. Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification [J] . Soleimani Hossein, Miller David J. Pattern Analysis and Applications . 2019,第2期

机译：在高维特征空间上利用类标签的价值：用于半监督文档分类的主题模型
4. Building Topic Models in a Federated Digital Library Through Selective Document Exclusion [C] . Miles Efron, Peter Organisciak, Katrina Fenlon ASIST annual meeting . 2011

机译：通过选择性文档排除在联合数字图书馆中构建主题模型
5. Cooperative exchange of digital documents among electronic libraries. The case of Latin America: Model and cost analysis. [D] . Delgado, Carlos R. 2001

机译：电子图书馆之间数字文档的合作交换。拉丁美洲的情况：模型和成本分析。
6. Incorporating Statistical Topic Models in the Retrieval of Healthcare Documents [O] . Karla Caballero, Ram Akella 2015

机译：在医疗文档检索中纳入统计主题模型
7. Enhancing document modeling by means of open topic models Crossing the frontier of classification schemes in digital libraries by example of the DDC [O] . Mehler Alexander, Waltinger Ulli 2009

机译：通过开放主题模型增强文档建模以DDC为例，跨越数字图书馆中分类方案的前沿
8. Text Classification of installation Support Contract Topic Models for Category Management. [R] . Sevier, W. C. 2018

机译：文本分类安装支持合同主题模型的类别管理。

Enhancing document modeling by means of open topic models Crossing the frontier of classification schemesin digital libraries by example of the DDC

摘要

著录项

相似文献

相关主题

期刊订阅