首页> 外文会议>International Joint Conference on Computer Science and Software Engineering >Text Corpus for Natural Language Story-telling Sentence Generation: A Design and Evaluation
【24h】

Text Corpus for Natural Language Story-telling Sentence Generation: A Design and Evaluation

机译:自然语言故事讲述句子的文本语料库:设计与评估

获取原文

摘要

Automatic generation of narrative sentences from unordered word sets is desirable in Augmentative and Alternative Communication (AAC) systems for children with certain learning disabilities (LD). Regardless of the complexity of the Natural Language Processing deployed in sentence generation procedures, the qualities of language models always affect the generation results. This work compared sentence generation accuracies obtained from a multi-tier N-gram-based procedure trained on BEST2010, a large publicly available text corpus, and a smaller but more specifically designed corpus in the task of Thai simple sentence generation. The latter, a new corpus called TELL-S, was created based on an analysis of the contents belonging to textbooks used in grade 1 and grade 2 for Thai language subjects according to the compulsory curriculum for Thai schools. The original procedure was also modified to incorporate additional constraints based on a story-telling guideline developed for LD children. Evaluated upon test sets of 195 sentences, each of which was composed of 3-6 words with a specific Part-Of-Speech combination, TELL-S was shown to provide better generalization and yielded higher accuracies than BEST2010 in all cases with unbiased word sets. The sentence generation accuracies were 100% and 70% for 3-word and 4-word sentences, respectively. The average accuracy was at 58.8% when longer sentences were also included.
机译:自动生成无序词集的叙事句是可取的,用于某些学习障碍的儿童(LD)的增强和替代通信(AAC)系统。无论在句子生成过程中部署的自然语言处理的复杂性如何,语言模型的质量总会影响生成结果。这项工作比较了从最好的基于N-GRAM的过程获得的句子生成精度,在最好的2010年,一个大型公开的文本语料库,以及泰语简单句子的任务中的较小但更专门设计的语料库。后者是一种名为Tell-S的新语料库,根据泰国学校强制课程的第1级和第2级和泰国语言科目的教科书的内容分析。还修改了原始程序,以基于为LD儿童开发的故事指南而纳入其他约束。评估195句的测试集,其中每个句子由3-6个单词组成,具有特定的语音组合,显示出来提供更好的泛化,并且在所有情况下都有比BOSE2010更高的精度,并且在所有情况下都有一个无偏的单词集。对于3个单词和4字句子,句子生成精度分别为100%和70%。当也包括更长的句子,平均准确度为58.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号