首页> 外文会议>International joint conference on natural language processing >Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization
【24h】

Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization

机译:自动提取用于日文归一化的变体归一化对

获取原文

摘要

Social media texts, such as tweets from Twitter, contain many types of nonstandard tokens, and the number of normalization approaches for handling such noisy text has been increasing. We present a method for automatically extracting pairs of a variant word and its normal form from unsegmented text on the basis of a pair-wise similarity approach. We incorporated the acquired variant-normalization pairs into Japanese morphological analysis. The experimental results show that our method can extract widely covered variants from large Twitter data and improve the recall of normalization without degrading the overall accuracy of Japanese morphological analysis.
机译:社交媒体文本(例如来自Twitter的推文)包含许多类型的非标准标记,并且处理此类嘈杂文本的归一化方法的数量一直在增加。我们介绍了一种自动提取从未分段文本中提取的变体单词和其正常形式的方法,基于一对方面的相似性方法。我们将所获得的变形 - 归一化对纳入日语形态分析。实验结果表明,我们的方法可以从大型Twitter数据中提取广泛覆盖的变体,提高正常化召回,而不会降低日语形态分析的整体准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号