Multi-granularity hierarchical topic-based segmentation of structured, digital library resources

Zhongyi Wang; Jin Zhang; Jing Huang

首页> 外文期刊>The Electronic Library >Multi-granularity hierarchical topic-based segmentation of structured, digital library resources

【24h】

Multi-granularity hierarchical topic-based segmentation of structured, digital library resources

机译：基于多粒度分层主题的结构化数字图书馆资源细分

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Purpose - Current segmentation systems almost invariably focus on linear segmentation and can only divide text into linear sequences of segments. This suits cohesive text such as news feed but not coherent texts such as documents of a digital library which have hierarchical structures. To overcome the focus on linear segmentation in document segmentation and to realize the purpose of hierarchical segmentation for a digital library's structured resources, this paper aimed to propose a new multi-granularity hierarchical topic-based segmentation system (MHTSS) to decide section breaks. Design/methodology/approach - MHTSS adopts up-down segmentation strategy to divide a structured, digital library document into a document segmentation tree. Specifically, it works in a three-stage process, such as document parsing, coarse segmentation based on document access structures and fine-grained segmentation based on lexical cohesion. Findings - This paper analyzed limitations of document segmentation methods for the structured, digital library resources. Authors found that the combination of document access structures and lexical cohesion techniques should complement each other and allow for a better segmentation of structured, digital library resources. Based on this finding, this paper proposed the MHTSS for the structured, digital library resources. To evaluate it, MHTSS was compared to the TT and C99 algorithms on real-world digital library corpora. Through comparison, it was found that the MHTSS achieves top overall performance. Practical implications - With MHTSS, digital library users can get their relevant information directly in segments instead of receiving the whole document. This will improve retrieval performance as well as dramatically reduce information overload. Originality/value - This paper proposed MHTSS for the structured, digital library resources, which combines the document access structures and lexical cohesion techniques to decide section breaks. With this system, end-users can access a document by sections through a document structure tree.

机译：目的-当前的分割系统几乎总是专注于线性分割，并且只能将文本划分为线性的分割序列。这适合于具有粘性的文本（例如新闻源），而不适合于具有一致性的文本（例如具有分层结构的数字图书馆文档）。为了克服对文档分段中线性分段的关注，并实现数字图书馆结构化资源的分层分段的目的，本文旨在提出一种新的基于多粒度分层主题的分段系统（MHTSS）来确定分节符。设计/方法/方法-MHTSS采用上下分段策略，将结构化的数字图书馆文档划分为文档分段树。具体来说，它以三个阶段的过程工作，例如文档解析，基于文档访问结构的粗略分割和基于词汇内聚的细粒度分割。调查结果-本文分析了结构化数字图书馆资源的文档分割方法的局限性。作者发现，文档访问结构和词汇衔接技术的结合应该互补，并可以更好地分割结构化的数字图书馆资源。基于这一发现，本文针对结构化的数字图书馆资源提出了MHTSS。为了评估它，将MHTSS与真实数字图书馆语料库上的TT和C99算法进行了比较。通过比较，发现MHTSS达到了最高的整体性能。实际意义-借助MHTSS，数字图书馆用户可以直接按段获取其相关信息，而无需接收整个文档。这将提高检索性能，并大大减少信息过载。原创性/价值-本文针对结构化的数字图书馆资源提出了MHTSS，它结合了文档访问结构和词汇衔接技术来确定分节符。使用此系统，最终用户可以通过文档结构树按节访问文档。

著录项

来源
《The Electronic Library》 |2017年第1期|99-120|共22页
作者
Zhongyi Wang; Jin Zhang; Jing Huang;
展开▼
作者单位

School of Information Management, Central China Normal University,Wuhan City, Hu Bei Province, China;

School of Information Studies, University of Wisconsin-Milwaukee,Milwaukee, Wisconsin, USA;

Wuhan Polytechnic, Wuhan City, Hu Bei Province, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
hierarchical segmentation; access structures; aic; digital library resources; lexical cohesion; optimum partitioning clustering; structured segmentation;

机译：分层细分;通道结构;aic;数字图书馆资源;词汇衔接;最佳分区聚类;结构化细分;

相似文献

外文文献
中文文献
专利

1. Topic-Based Hierarchical Segmentation [J] . Chien J.-T., Chueh C.-H. Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第1期

机译：基于主题的分层细分
2. Martha L. Brogan, with the assistance of Daphnée Rentfrow. A Kaleidoscope of Digital American Literature. Washington, D.C.: Council on Library and Information Resources and the Digital Library Federation (Strategies and Tools for the Digital Library), 200 [J] . Michael Ryan College & Research Libraries . 2006,第3期

机译：Martha L. Brogan，在DaphnéeRentfrow的协助下。美国数字文学的万花筒。华盛顿特区：图书馆与信息资源委员会和数字图书馆联合会（数字图书馆的策略和工具），200
3. Martha L. Brogan, with the assistance of Daphn??e Rentfrow. A Kaleidoscope of Digital American Literature . Washington, D.C.: Council on Library and Information Resources and the Digital Library Federation (Strategies and Tools for the Digital Library), 2005. 176p. alk. paper, $30 (ISBN 1932326170). LC 2005-22693. [J] . Michael Ryan College & Research Libraries . 2006,第3期

机译：玛莎·布罗根（Martha L. Brogan）在达芬（Daphn ?? e Rentfrow）的协助下。美国数字文学的万花筒。华盛顿特区：图书馆与信息资源委员会和数字图书馆联合会（数字图书馆的策略和工具），2005年。176页。 alk。纸，30美元（ISBN 1932326170）。 LC 2005-22693。
4. Construction of Reusable Integrable Multi-layer and Multi-granularity Educational Resource Library [C] . Zhenhua Li, Zhaoli Zhang, Tingting Liu, 2016 International conference on educational innovation through technology . 2016

机译：可重用的可集成多层，多粒度教育资源库的建设
5. Enhancing a domain-specific digital library with metadata based on hierarchical controlled vocabularies. [D] . Weaver, Mathew Jon. 2005

机译：使用基于分层控制词汇的元数据增强特定于域的数字图书馆。
6. The use of free resources in a subscription-based digital library: a case study of the North Carolina AHEC Digital Library [O] . Mary Beth Schell 2006

机译：基于订阅的数字图书馆中免费资源的使用：北卡罗莱纳州AHEC数字图书馆的案例研究
7. Evaluating hierarchical organisation structures for exploring digital libraries [O] . Hall M., Fernando S., Clough P., 2014

机译：评估分层组织结构以探索数字图书馆

Multi-granularity hierarchical topic-based segmentation of structured, digital library resources

摘要

著录项

相似文献

相关主题

期刊订阅