【24h】

Development of a Song Lyric Corpus for the English Language

机译:开发英文歌曲歌词库

获取原文

摘要

Web Scraping Tools are simplifying the task of creating large databases for various applications such as the construction of corpus aimed at the development of applications for natural language processing. Many of these applications require a large amount of data, and in that sense, the Web presents itself as an important data source. Among the various tasks in the NLP scope, one of the most challenging is automatic text generation. In this task the objective is to generate syntactically and semantically correct texts after a training process on a particular corpus. This article presents the elaboration of an English song lyrics Corpus, extracted from the Web, that can be used to train applications for automatic generation of lyrics, poems, or other NPL related tasks. After its normalization, an analysis of the Corpus is presented, as well as analyzes performed after the corpus vectorization (embedding) generated with the use of two current techniques.
机译:Web爬网工具正在简化为各种应用程序创建大型数据库的任务,例如旨在开发用于自然语言处理的应用程序的语料库的构建。这些应用程序中的许多应用程序都需要大量数据,从这个意义上讲,Web本身就是重要的数据源。在NLP范围内的各种任务中,最具挑战性的任务之一是自动文本生成。在此任务中,目标是在对特定语料库进行训练之后,生成句法和语义上正确的文本。本文介绍了从网络中提取的英文歌曲歌词语料库的详细说明,该语料库可用于训练自动生成歌词,诗歌或其他NPL相关任务的应用程序。将其标准化后,将对语料库进行分析,以及使用两种当前技术生成的语料库矢量化(嵌入)后执行的分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号