【24h】

Hierarchical Orderings of Textual Units

机译:文本单位的层次顺序

获取原文
获取原文并翻译 | 示例

摘要

Text representation is a central task for any approach to automatic learning from texts. It requires a format which allows to interrelate texts even if they do not share content words, but deal with similar topics. Furthermore, measuring text similarities raises the question of how to organize the resulting clusters. This paper presents cohesion trees (CT) as a data structure for the perspective, hierarchical organization of text corpora. CTs operate on alternative text representation models taking lexical organization, quantitative text characteristics, and text structure into account. It is shown that CTs realize text linkages which are lexically more homogeneous than those produced by minimal spanning trees.
机译:文本表示是从文本自动学习的任何方法的中心任务。它要求一种格式,即使它们不共享内容词,但处理相似的主题,也允许相互关联文本。此外,测量文本相似度提出了如何组织结果聚类的问题。本文介绍了凝聚树(CT)作为文本语料库的透视,层次结构的数据结构。 CT在考虑词法组织,定量文本特征和文本结构的替代文本表示模型上运行。结果表明,CT实现的文本链接比最小生成树所生成的文本链接在词法上更加同质。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号