首页> 外文期刊>Expert systems with applications >Explicit song lyrics detection with subword-enriched word embeddings
【24h】

Explicit song lyrics detection with subword-enriched word embeddings

机译:用语中富有单词嵌入的显式歌曲歌词检测

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we investigate the problem of automatically detecting explicit song lyrics, i.e., determining if the lyrics of a given song could be offensive or unsuitable for children. The problem can be framed as a binary classification task, and in this work we propose to tackle it with the FASTTEXT classifier, an efficient linear classification model leveraging a peculiar distributional text representation that, by exploiting subword information in building the embeddings of the words, enables to cope with words not seen at training time. We assess the performance of the FASTTEXT classifier and word representations with a lyrics dataset of over 800K songs, annotated with explicit information, that we assembled from publicly available resources. The evaluation shows that the FASTTEXT classifier is effective for explicit lyrics detection, substantially outperforming a reference approach for the task, and that the subword information effectively contributes to this result. (C) 2020 Elsevier Ltd. All rights reserved.
机译:在本文中,我们调查了自动检测显式歌曲歌词的问题,即,确定给定歌曲的歌词是否可能是令人反感的或不适合儿童的问题。问题可以框架作为二进制分类任务,并且在这项工作中,我们建议使用FastText分类器来解决它,其利用特殊的分布文本表示的有效线性分类模型通过利用在构建单词的嵌入中的子字信息中来实现特殊的分布文本表示。允许应对在培训时间没有看到的单词。我们评估FastText分类器和Word表示的性能与800K歌曲的歌词数据集,用明确的信息注释,我们从公开的资源中组装。评估表明,FastText分类器对于显式歌词检测是有效的,基本上优于任务的参考方法,并且子字信息有效地贡献该结果。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号