首页> 外文期刊>ACM Transactions on Information Systems >Taxonomy generation for text segments: a practical web-based approach
【24h】

Taxonomy generation for text segments: a practical web-based approach

机译:文本段分类法生成:基于Web的实用方法

获取原文
获取原文并翻译 | 示例

摘要

It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed taxonomy. In this article, we address the problem of taxonomy generation for diverse text segments with a general and practical approach that uses the Web as an additional knowledge source. Unlike long documents, short text segments typically do not contain enough information to extract reliable features. This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments. A hierarchical clustering algorithm is then designed for creating the hierarchical topic structure of text segments. Text segments with close concepts can be grouped together in a clus-ter, and relevant clusters linked at the same or near levels. Different from traditional clustering algorithms, which tend to produce cluster hierarchies with a very unnatural shape, the algorithm tries to produce a more natural and comprehensive tree hierarchy. Extensive experiments were conducted on different domains of text segments, including subject terms, people names, paper ti-tles, and natural language questions. The obtained experimental results have shown the potential of the proposed approach, which provides a basis for the in-depth analysis of text segments on a larger scale and is believed able to benefit many information systems.
机译:在许多信息系统中,至关重要的是将短文本段(例如文档中的关键字和用户的查询)组织成格式正确的分类法。在本文中,我们通过使用Web作为附加知识源的通用方法来解决针对不同文本段的分类法生成问题。与长文档不同,短文本段通常没有足够的信息来提取可靠的功能。这项工作研究了使用排名较高的搜索结果片段来丰富文本片段表示形式的可能性。然后,设计了层次聚类算法,用于创建文本段的层次主题结构。可以将具有紧密概念的文本段组合在一起,并在相同或接近的级别上链接相关的群集。与传统的聚类算法不同,传统的聚类算法倾向于生成非常不自然的形状的聚类层次结构,而该算法则试图生成更自然,更全面的树形层次结构。在文本段的不同领域进行了广泛的实验,包括主题词,人名,论文标题和自然语言问题。获得的实验结果表明了该方法的潜力,该方法为大规模分析文本片段提供了基础,并被认为能够使许多信息系统受益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号