首页> 外文会议>Workshop on NLP for Similar Languages, Varieties and Dialects >Towards Augmenting Lexical Resources for Slang and African American English
【24h】

Towards Augmenting Lexical Resources for Slang and African American English

机译:为增强俚语和非洲裔美国英语的词汇资源

获取原文

摘要

Researchers in natural language processing have developed large, robust resources for understanding formal Standard American English (SAE), but we lack similar resources for variations of English, such as slang and African American English (AAE). In this work, we use word em-beddings and clustering algorithms to group semantically similar words in three datasets, two of which contain high incidence of slang and AAE. Since high-quality clusters would contain related words, we could also infer the meaning of an unfamiliar word based on the meanings of words clustered with it. After clustering, we compute precision and recall scores using WordNet and ConceptNet as gold standards and show that these scores are unimportant when the given resources do not fully represent slang and AAE. Amazon Mechanical Turk and expert evaluations show that clusters with low precision can still be considered high quality, and we propose the new Cluster Split Score as a metric for machine-generated clusters. These contributions emphasize the gap in natural language processing research for variations of English and motivate further work to close it.
机译:自然语言处理的研究人员已经开发了庞大,强大的资源,用于了解正式标准的美国英语(SAE),但我们缺乏类似的英语变体的资源,例如俚语和非洲裔美国英语(AAE)。在这项工作中,我们将文字EM-BEDDING和聚类算法与三个数据集中的三个语义类似的单词一起使用,其中两个包含俚语和AAE的高发生率。由于高质量的集群将包含相关的单词,因此我们也可以根据与它聚集的单词的含义来推断不熟悉的单词的含义。在聚类之后,我们使用Wordnet和ConceptNet计算精度并调用分数作为金标准,并且当给定的资源没有完全代表Slang和AAE时,这些分数是不重要的。亚马逊机械土耳其人和专家评估表明,具有低精度的集群仍然可以被认为是高质量的,并且我们将新的集群分数分数提出为机器生成的集群的指标。这些贡献强调了对英​​语变化的自然语言处理研究的差距,并激励进一步的工作来关闭它。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号