首页> 外文会议>15th SIGMORPHON workshop on computational research in phonetics, phonology, and morphology >A Comparison of Entity Matching Methods between English and Japanese Katakana
【24h】

A Comparison of Entity Matching Methods between English and Japanese Katakana

机译:日语和日语片假名的实体匹配方法比较

获取原文
获取原文并翻译 | 示例

摘要

Japanese Katakana is one component of the Japanese writing system and is used to express English terms, loanwords, and onomatopoeia in Japanese characters based on the phonemes. The main purpose of this research is to find the best entity matching methods between English and Katakana. We built two research questions to clarify which types of entity matching systems works better than others. The first question is what transliteration should be used for conversion. We need to transliterate English or Katakana terms into the same form in order to compute the string similarity. We consider five conversions that transliterate English to Katakana directly, Katakana to English directly, English to Katakana via phoneme, Katakana to English via phoneme, and both English and Katakana to phoneme. The second question is what should be used for the similarity measure at entity matching. To investigate the problem, we choose six methods, which are Overlap Coefficient, Cosine, Jaccard, Jaro-Winkler, Levenshtein, and the similarity of the phoneme probability predicted by RNN. Our results show that 1) matching using phonemes and conversion of Katakana to English works better than other methods, and 2) the similarity of phonemes outperforms other methods while other similarity score is changed depending on data and models.
机译:日语片假名是日语书写系统的组成部分,用于根据音素在日语字符中表达英语术语,外来词和拟声词。这项研究的主要目的是找到英语和片假名之间的最佳实体匹配方法。我们建立了两个研究问题,以阐明哪种类型的实体匹配系统比其他系统更有效。第一个问题是应使用哪种音译进行转换。为了计算字符串相似度,我们需要将英语或片假名术语音译成相同的形式。我们考虑了五种转换,分别将英语直接转换为片假名,直接将片假名转换为英语,通过音素将英语转换为片假名,通过音素将片假名转换为英语以及将英语和片假名转换为音素。第二个问题是在实体匹配时应将什么用于相似性度量。为了研究该问题,我们选择了六种方法,即重叠系数,余弦,雅卡德,雅罗·温克勒,勒文施泰因和RNN预测的音素概率的相似性。我们的结果表明,1)使用音素进行匹配并将片假名转换为英语比其他方法更好,并且2)音素的相似度胜过其他方法,而其他相似度得分根据数据和模型而变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号