Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints

Sandor Daranyi; Peter Wittek; Milena Dobreva

首页> 外文期刊>International journal on digital libraries >Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints

【24h】

Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints

机译：使用小波分析对数字图书馆中的文本进行分类：Strathprints的首次实验

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Digital libraries increasingly benefit from research on automated text categorization for improved access. Such research is typically carried out by means of standard test collections. In this article, we present a pilot experiment of replacing such test collections by a set of 6,000 objects from a real-world digital repository, indexed by Library of Congress Subject Headings, and test support vector machines in a supervised learning setting for their ability to reproduce the existing classification. To augment the standard approach, we introduce a combination of two novel elements: using functions for document content representation in Hilbert space, and adding extra semantics from lexical resources to the representation. Results suggest that wavelet-based kernels slightly outperformed traditional kernels on classification reconstruction from abstracts and vice versa from full-text documents, the latter outcome being due to word sense ambiguity. The practical implementation of our methodological framework enhances the analysis and representation of specific knowledge relevant to large-scale digital collections, in this case the thematic coverage of the collections. Representation of specific knowledge about digital collections is one of the basic elements of the persistent archives and the less studied one (compared to representations of digital objects and collections). Our research is an initial step in this direction developing further the methodological approach and demonstrating that text categorization can be applied to analyse the thematic coverage in digital repositories.

机译：数字图书馆越来越多地受益于自动文本分类研究，以改善访问权限。此类研究通常通过标准测试集来进行。在本文中，我们提供了一个试验性实验，该实验用来自现实世界数字存储库的6,000个对象集替换了这些测试集，并由国会图书馆主题词索引，并在有监督的学习环境中测试支持向量机的能力，复制现有分类。为了增强标准方法，我们引入了两个新颖元素的组合：使用函数在希尔伯特空间中进行文档内容表示，并将词法资源中的额外语义添加到表示中。结果表明，基于小波的内核在摘要的分类重构方面稍胜于传统的内核，反之亦然，在全文文档中反之亦然，后者的结果归因于词义的歧义。我们方法框架的实际实施可以增强对与大规模数字馆藏有关的特定知识的分析和表示能力，在这种情况下，可以对馆藏进行专题报道。关于数字收藏的特定知识的表示是永久档案的基本要素之一，而研究较少的要素（与数字对象和收藏的表示相比）。我们的研究是朝这个方向迈出的第一步，进一步发展了方法论方法，并证明了文本分类可用于分析数字存储库中的主题范围。

著录项

来源
《International journal on digital libraries》 |2012年第1期|p.3-12|共10页
作者
Sandor Daranyi; Peter Wittek; Milena Dobreva;
展开▼
作者单位

Swedish School of Library and Information Science,University of Boras, Boras, Sweden;

Swedish School of Library and Information Science,University of Boras, Boras, Sweden;

Centre for Digital Library Research, University of Strathclyde,Glasgow, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
digital libraries; text categorization; machinelearning; support vector machines; analogical informationrepresentation; wavelet analysis;

机译：数字图书馆;文本分类机器学习支持向量机;类比信息表示小波分析;
入库时间 2022-08-18 02:07:46

相似文献

外文文献
中文文献
专利

1. Domain analysis with text mining: Analysis of digital library research trends using profiling methods [J] . Jae Yun Lee, rnHeejung Kim, rnPan Jun Kim Journal of Information Science . 2010,第2期

机译：使用文本挖掘进行领域分析：使用概要分析方法分析数字图书馆的研究趋势
2. Full-text federated search of text-based digital libraries in peer-to-peer networks [J] . Jie Lu, Jamie Callan Information retrieval . 2006,第4期

机译：对等网络中基于文本的数字图书馆的全文联合搜索
3. Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text [J] . Gadri Said, Moussaoui Abdelouahab The international arab journal of information technology . 2017,第6期

机译：上下文文本分类：一种改进的词干算法，可提高阿拉伯文本分类的质量
4. An Approach for Text Categorization in Digital Library [C] . Wang, Tao, Desai, . 2007

机译：数字图书馆中的文本分类方法
5. Applications of wavelets to nonlinear wave analysis and digital communication. [D] . Yi, Eun-jik. 2000

机译：小波在非线性波分析和数字通信中的应用。
6. Term Familiarity to indicate Perceived and Actual Difficulty of Text in Medical Digital Libraries [O] . Gondy Leroy, James E. Endicott -1

机译：熟悉术语表明医学数字图书馆文本的感知和实际难度
7. Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints [O] . Sándor Darányi, Peter Wittek, Milena Dobreva 2012

机译：在数字图书馆中使用小波分类进行文本分类：Strathprints的第一个实验

Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints

摘要

著录项

相似文献

相关主题

期刊订阅