首页> 外文会议>International Conference on New Trends in Computing Sciences >Detecting Quotable Sentences from Text Using Syntactic Token Augmentation and Recurrent Neural Networks
【24h】

Detecting Quotable Sentences from Text Using Syntactic Token Augmentation and Recurrent Neural Networks

机译:使用句法标记增强和递归神经网络从文本中检测可引用的句子

获取原文

摘要

From the beginning of the literary era, 'Quotations' have been inspiring writers and avid readers. The information outburst starting right from the printing of books to digitization of such media, hints towards the innumerable occurrences of unexposed quotations. In this paper, we explore an approach to detect and extract quotable sentences from large text corpses. Analysis of quotations from the gold standard sources suggests that quotable sentences can be detected with their intrinsic properties of sentence structure and style of speech that makes the quotations meaningful, concise, and effective. However, subjectivity of this property makes this problem more convoluted than any other binary text classification problem. To resolve that, we describe a method to preserve sentence semantics during the process of filtering and classification of sentences. Our language model with Recurrent Neural Networks outperforms the other approaches to detect quotations with 84.30% accuracy, which is a significant 2.8% improvement with our own statistical model. The work also adds to the understanding of the impact of manually designed features on text classification problems.
机译:从文学时代开始,“语录”一直激励着作家和狂热的读者。从书籍的印刷到这种媒体的数字化,信息的爆发从一开始就暗示着出现大量未公开报价的情况。在本文中,我们探索了一种从大型文本尸体中检测和提取可引用句子的方法。对来自黄金标准来源的报价的分析表明,可以检测具有报价句的句子结构和语言风格的内在属性,使报价有意义,简洁和有效。但是,此属性的主观性使此问题比任何其他二进制文本分类问题更复杂。为了解决这个问题,我们描述了一种在句子的过滤和分类过程中保留句子语义的方法。我们的Recurrent Neural Networks语言模型以84.30%的准确性优于其他方法来检测报价,与我们自己的统计模型相比,这明显提高了2.8%。这项工作还增加了对手动设计的功能对文本分类问题的影响的理解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号