首页> 外文会议>Conference on empirical methods in natural language processing >Splitting Noun Compounds via Monolingual and Bilingual Paraphrasing: A Study on Japanese Katakana Words
【24h】

Splitting Noun Compounds via Monolingual and Bilingual Paraphrasing: A Study on Japanese Katakana Words

机译:通过单语和双语释义分裂名词化合物:日本卡其卡语的研究

获取原文

摘要

Word boundaries within noun compounds are not marked by white spaces in a number of languages, unlike in English, and it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds made up of katakana words (i.e., transliterated foreign words) are particularly difficult to split, because katakana words are highly productive and are often out-of-vocabulary. To overcome this difficulty, we propose using monolingual and bilingual paraphrases of katakana noun compounds for identifying word boundaries. Experiments demonstrated that splitting accuracy is substantially improved by extracting such paraphrases from unlabeled textual data, the Web in our case, and then using that information for constructing splitting models.
机译:名词内的单词边界不是由许多语言中的白色空间标记,与英语不同,对于分离这些名词化合物,各种NLP应用是有益的。在日语的情况下,由卡其卡语言(即,音译外来词语)组成的名词化合物特别难以分裂,因为卡塔卡纳词是高于生产力的,并且通常是非词汇。为了克服这种困难,我们建议使用Katakana Noun化合物的单晶和双语释义来识别字界。实验表明,通过从未标记的文本数据,在我们的情况下提取这些释义,然后使用该信息来构建分割模型来显着改善分裂精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号