首页> 外文会议>International joint conference on natural language processing >Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization
【24h】

Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization

机译:自动提取变体-归一化对以进行日语文本归一化

获取原文

摘要

Social media texts, such as tweets from Twitter, contain many types of nonstandard tokens, and the number of normalization approaches for handling such noisy text has been increasing. We present a method for automatically extracting pairs of a variant word and its normal form from unsegmented text on the basis of a pair-wise similarity approach. We incorporated the acquired variant-normalization pairs into Japanese morphological analysis. The experimental results show that our method can extract widely covered variants from large Twitter data and improve the recall of normalization without degrading the overall accuracy of Japanese morphological analysis.
机译:社交媒体文本(例如来自Twitter的推文)包含许多类型的非标准令牌,并且处理此类嘈杂文本的规范化方法的数量一直在增加。我们提出了一种基于成对相似度方法从未分段的文本中自动提取成对的变体词及其正常形式的方法。我们将获得的变异标准化对纳入日本形态分析。实验结果表明,我们的方法可以从大量Twitter数据中提取覆盖范围广泛的变体,并提高归一化率,而不会降低日本形态分析的整体准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号