首页> 外文期刊>Data & Knowledge Engineering >Narrative-based taxonomy distillation for effective indexing of text collections
【24h】

Narrative-based taxonomy distillation for effective indexing of text collections

机译:基于叙事的分类学提炼,可有效索引文本集

获取原文
获取原文并翻译 | 示例
           

摘要

Taxonomies embody formalized knowledge and define aggregations between concepts/categories in a given domain, facilitating the organization of the data and making the contents easily accessible to the users. Since taxonomies have significant roles in data annotation, search and navigation, they are often carefully engineered. However, especially in domains, such as news, where content dynamically evolves, they do not necessarily reflect the content knowledge. Thus, in this paper, we ask and answer, in the positive, the following question: "is it possible to efficiently and effectively adapt a given taxonomy to a usage context defined by a corpus of documents?" In particular, we recognize that the primary role of a taxonomy is to describe or narrate the natural relationships between concepts in a given document corpus. Therefore, a corpus-aware adaptation of a taxonomy should essentially distill the structure of the existing taxonomy by appropriately segmenting and, if needed, summarizing this narrative relative to the content of the corpus. Based on this key observation, we propose A Narrative Interpretation of Taxonomies for their Adaptation (ANITA) for re-structuring existing taxonomies to varying application contexts and we evaluate the proposed scheme using different text collections. Finally we provide user studies that show that the proposed algorithm is able to adapt the taxonomy in a new compact and understandable structure.
机译:分类法体现了形式化的知识,并定义了给定领域中概念/类别之间的聚合,从而促进了数据的组织并使用户易于访问内容。由于分类法在数据注释,搜索和导航中具有重要作用,因此它们通常经过精心设计。但是,尤其是在新闻等内容动态发展的领域中,它们不一定反映内容知识。因此,在本文中,我们肯定地问和回答以下问题:“是否有可能有效地使给定的分类法适应由文档集定义的使用环境?”特别是,我们认识到分类法的主要作用是描述或叙述给定文档语料库中概念之间的自然关系。因此,分类法的语料库感知适应应实质上通过适当地分割并在需要时总结相对于语料库内容的叙述来提炼现有分类法的结构。基于此关键观察,我们提出了一种分类学适应性的叙事解释(ANITA),用于将现有分类学重构为适应不同的应用程序上下文,并使用不同的文本集评估了所提出的方案。最后,我们提供的用户研究表明,所提出的算法能够在新的紧凑且易于理解的结构中适应分类法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号