...
首页> 外文期刊>PLoS One >The impact of differences in text segmentation on the automated quantitative evaluation of song-lyrics
【24h】

The impact of differences in text segmentation on the automated quantitative evaluation of song-lyrics

机译:文本分割差异对宋歌词自动定量评价的影响

获取原文
           

摘要

The text-evaluation application Coh-Metrix and natural language processing rely on the sentence for text segmentation and analysis and frequently detect sentence limits by means of punctuation. Problems arise when target texts such as pop song lyrics do not follow formal standards of written text composition and lack punctuation in the original. In such cases it is common for human transcribers to prepare texts for analysis, often following unspecified or at least unreported rules of text normalization and relying potentially on an assumed shared understanding of the sentence as a text-structural unit. This study investigated whether the use of different transcribers to insert typographical symbols into song lyrics during the pre-processing of textual data can result in significant differences in sentence delineation. Results indicate that different transcribers (following commonly agreed-upon rules of punctuation based on their extensive experience with language and writing as language professionals) can produce differences in sentence segmentation. This has implications for the analysis results for at least some Coh-Metrix measures and highlights the problem of transcription, with potential consequences for quantification at and above sentence level. It is argued that when analyzing non-traditional written texts or transcripts of spoken language it is not possible to assume uniform text interpretation and segmentation during pre-processing. It is advisable to provide clear rules for text normalization at the pre-processing stage, and to make these explicit in documentation and publication.
机译:文本评估应用COH-METRIX和自然语言处理依赖于文本分段和分析的句子,并常常通过标点符号检测句子限制。当POP歌曲歌词等目标文本不遵守正式标准的书面文本组成并缺乏原始标准时出现问题。在这种情况下,人类转录通常是准备用于分析的文本,通常关注未指定或至少记录的文本规范化规则,并依赖于作为文本结构单元的假定对句子的共享理解。本研究调查了在文本数据预处理期间使用不同的转录器将印刷符号插入歌曲歌词是否可能导致句子描绘中的显着差异。结果表明,不同的转录(根据其语言专业人士的广泛经验,根据其广泛的语言和写作的标点符合标点符号)可以产生句子细分的差异。这对至少一些COH-METRIX措施的分析结果具有影响,并突出了转录问题,具有在句子水平和以上定量的潜在后果。有人认为,当分析非传统书面文本或口语的成绩单时,在预处理期间不可能承担统一的文本解释和分段。建议在预处理阶段提供明确的文本标准化规则,并在文档和发布中制作这些明确。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号