Text Corpus for Natural Language Story-telling Sentence Generation: A Design and Evaluation

机译：自然语言故事讲述句子的文本语料库：设计与评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic generation of narrative sentences from unordered word sets is desirable in Augmentative and Alternative Communication (AAC) systems for children with certain learning disabilities (LD). Regardless of the complexity of the Natural Language Processing deployed in sentence generation procedures, the qualities of language models always affect the generation results. This work compared sentence generation accuracies obtained from a multi-tier N-gram-based procedure trained on BEST2010, a large publicly available text corpus, and a smaller but more specifically designed corpus in the task of Thai simple sentence generation. The latter, a new corpus called TELL-S, was created based on an analysis of the contents belonging to textbooks used in grade 1 and grade 2 for Thai language subjects according to the compulsory curriculum for Thai schools. The original procedure was also modified to incorporate additional constraints based on a story-telling guideline developed for LD children. Evaluated upon test sets of 195 sentences, each of which was composed of 3-6 words with a specific Part-Of-Speech combination, TELL-S was shown to provide better generalization and yielded higher accuracies than BEST2010 in all cases with unbiased word sets. The sentence generation accuracies were 100% and 70% for 3-word and 4-word sentences, respectively. The average accuracy was at 58.8% when longer sentences were also included.

机译：自动生成无序词集的叙事句是可取的，用于某些学习障碍的儿童（LD）的增强和替代通信（AAC）系统。无论在句子生成过程中部署的自然语言处理的复杂性如何，语言模型的质量总会影响生成结果。这项工作比较了从最好的基于N-GRAM的过程获得的句子生成精度，在最好的2010年，一个大型公开的文本语料库，以及泰语简单句子的任务中的较小但更专门设计的语料库。后者是一种名为Tell-S的新语料库，根据泰国学校强制课程的第1级和第2级和泰国语言科目的教科书的内容分析。还修改了原始程序，以基于为LD儿童开发的故事指南而纳入其他约束。评估195句的测试集，其中每个句子由3-6个单词组成，具有特定的语音组合，显示出来提供更好的泛化，并且在所有情况下都有比BOSE2010更高的精度，并且在所有情况下都有一个无偏的单词集。对于3个单词和4字句子，句子生成精度分别为100％和70％。当也包括更长的句子，平均准确度为58.8％。

著录项

来源
《International Joint Conference on Computer Science and Software Engineering》|2014年||共6页
会议地点
作者
Worasa Limpanadusadee; Proadpran Punyabukkana; Atiwong Suchato; Onintra Poobrasert;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.5-53;
关键词
Natural Language Generation; N-Gram Model; Augmentative and Alternative Communication; Statistical Natural Language Processing; Learning Disabilities; Corpus Management;

机译：自然语言生成;n-gram模型;增强和替代通信;统计自然语言处理;学习障碍;语料库管理;

相似文献

外文文献
中文文献
专利

1. A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools [J] . Karin Verspoor, Kevin B Cohen, Arrick Lanfranchi, BMC Bioinformatics . 2012,第1期

机译：全文期刊文章集是一种强大的评估工具，可揭示生物医学自然语言处理工具的性能差异
2. Note from the editor: 'Expositions of Romanian scientists on the design of text-to-speech synthesis and natural language understanding and generation systems' [J] . Amy Neustein International journal of speech technology . 2009,第2a3期

机译：编辑的注解：“罗马尼亚科学家关于文本到语音合成以及自然语言理解和生成系统的设计博览会”
3. CONVERTING NATURAL LANGUAGE TEXT SENTENCES INTO SPN REPRESENTATIONS FOR ASSOCIATING EVENTS [J] . NIKOLAOS BOURBAKIS, MICHAEL MILLS International journal of semantic computing . 2012,第3期

机译：将自然语言文本句子转换为与事件相关的SPN表示
4. Text Corpus for Natural Language Story-telling Sentence Generation: A Design and Evaluation [C] . Worasa Limpanadusadee, Proadpran Punyabukkana, Atiwong Suchato, International Joint Conference on Computer Science and Software Engineering . 2014

机译：自然语言故事讲述句子的文本语料库：设计与评估
5. A foundation for general-purpose natural language generation: Sentence realization using probabilistic models of language. [D] . Langkilde-Geary, Irene. 2003

机译：通用自然语言生成的基础：使用语言的概率模型实现句子。
6. A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools [O] . Karin Verspoor, Kevin Bretonnel Cohen, Arrick Lanfranchi, 2012

机译：全文期刊文章集是一种强大的评估工具可揭示生物医学自然语言处理工具的性能差异
7. Revision-Based Generation of Natural Language Summaries Providing Historical Background: Corpus-Based Analysis, Design, Implementation and Evaluation [O] . Robin Jacques 1994

机译：提供历史背景的基于修订的自然语言摘要生成：基于语料库的分析，设计，实现和评估

Text Corpus for Natural Language Story-telling Sentence Generation: A Design and Evaluation

摘要

著录项

相似文献

相关主题

期刊订阅