首页> 外文期刊>Applied Artificial Intelligence >TOWARD A MORE GLOBAL AND COHERENT SEGMENTATION OF TEXTS
【24h】

TOWARD A MORE GLOBAL AND COHERENT SEGMENTATION OF TEXTS

机译:迈向更全面和统一的文本分类

获取原文
获取原文并翻译 | 示例
       

摘要

The automatic text segmentation task consists of identifying the most important thematic breaks in a document in order to cut it into homogeneous passages. Text segmentation has motivated a large amount of research. We focus here on the statistical approaches that rely on an analysis of the distribution of the words in the text. Usually, the segmentation of texts is realized sequentially on the basis of very local clues. However, such an approach prevents the consideration of the text in a global way, particularly concerning the granularity degree adopted for the expression of the different topics it addresses. We thus propose here two new segmentation algorithms- ClassStruggle and SegGen-which use criteria rendering global views of texts. ClassStruggle is based on an initial clustering of the sentences of the text, thus allowing the consideration of similarities within a group rather than individually. It relies on the distribution of the occurrences of the members of each class1 to segment the texts. SegGen proposes to evaluate potential segmentations of the whole text thanks to a genetic algorithm. It attempts to find a solution of segmentation optimizing two criteria, the maximization of the internal cohesion of the segments and the minimization of the similarity between adjacent ones. According to experimental results, both approaches appear to be very competitive compared to existing methods.
机译:自动文本分割任务包括识别文档中最重要的主题中断,以便将其切成均匀的段落。文本分割已激发了大量的研究。在这里,我们将重点放在统计方法上,该方法依赖于对文本中单词分布的分析。通常,文本的分割是根据非常本地的线索顺序实现的。但是,这种方法阻止了对文本的整体考虑,尤其是在表达其所涉及的不同主题时所采用的粒度上。因此,我们在这里提出了两种新的分割算法-ClassStruggle和SegGen-它们使用标准呈现文本的全局视图。 ClassStruggle基于文本句子的初始聚类,因此可以考虑组内而不是单个内的相似性。它依靠每个类的成员的出现的分布来对文本进行分段。 SegGen建议使用遗传算法来评估整个文本的潜在细分。它试图找到一种优化两个准则的分割解决方案,即最大化片段的内部内聚力和最小化相邻片段之间的相似性。根据实验结果,与现有方法相比,这两种方法都具有很高的竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号