首页> 外文会议>International Conference of Signal Processing and Intelligent Systems >Text coherence new method using word2vec sentence vectors and most likely n-grams
【24h】

Text coherence new method using word2vec sentence vectors and most likely n-grams

机译:文本连贯新方法使用Word2Vec句子向量,很可能是n-grams

获取原文

摘要

Discourse coherence modeling evaluation remains a challenge task in all Natural Language Processing subfields. Most proposed approaches focus on feature engineering, which accepts the sophisticated features to capture the logic, syntactic or semantic relationships between all sentences within a text. This paper investigates the automatic evaluation of text coherence. We introduce a fully-automatic rich statistical model of local and global coherence that uses word2vec approach to assess the coherence a document. Our modeling approach relies on numerical vectors derived from word2vec algorithm applied on a very large collection of texts. We successfully combined the word2vec vectors and most likely n-grams with cohesive LD-n-grams perplexity to assess the coherence and topic integrity of document. We present experimental results that assess the predictive power that it does not depend on the language and its semantic concepts. So it has the ability to apply on any language. Our model achieves state-of-the-art performance in coherence evaluation and order discrimination task on two datasets widely used in the previous methods.
机译:话语一致性建模评估仍然是所有自然语言处理子场中的挑战任务。大多数提议的方法都侧重于特征工程,它接受了复杂的功能,以捕获文本中所有句子之间的逻辑,句法或语义关系。本文调查了文本连贯性的自动评估。我们介绍了一个全自动丰富的本地和全局一致性统计模型,使用Word2VEC方法来评估一致文件。我们的建模方法依赖于来自Word2Vec算法的数值vec in应用于非常大的文本集合。我们成功地将Word2Vec向量组合起来,很可能是具有凝聚力的LD-N-GRAM的N-GRAM困惑,以评估文档的一致性和主题完整性。我们提出了实验结果,评估了它不依赖于语言及其语义概念的预测力。因此它有能力申请任何语言。我们的模型在两种数据集中的一致性评估和顺序辨别任务中实现了最先进的性能,以前的两个数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号