Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization

机译：自动提取变体-归一化对以进行日语文本归一化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Social media texts, such as tweets from Twitter, contain many types of nonstandard tokens, and the number of normalization approaches for handling such noisy text has been increasing. We present a method for automatically extracting pairs of a variant word and its normal form from unsegmented text on the basis of a pair-wise similarity approach. We incorporated the acquired variant-normalization pairs into Japanese morphological analysis. The experimental results show that our method can extract widely covered variants from large Twitter data and improve the recall of normalization without degrading the overall accuracy of Japanese morphological analysis.

机译：社交媒体文本（例如来自Twitter的推文）包含许多类型的非标准令牌，并且处理此类嘈杂文本的规范化方法的数量一直在增加。我们提出了一种基于成对相似度方法从未分段的文本中自动提取成对的变体词及其正常形式的方法。我们将获得的变异标准化对纳入日本形态分析。实验结果表明，我们的方法可以从大量Twitter数据中提取覆盖范围广泛的变体，并提高归一化率，而不会降低日本形态分析的整体准确性。

著录项

来源
《International joint conference on natural language processing》|2017年|937-946|共10页
会议地点
作者
Itsumi Saito; Kyosuke Nishida; Kugatsu Sadamitsu; Kuniko Saito; Junji Tomita;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles [J] . Xu Rong, Wang QuanQiu Journal of biomedical informatics. . 2015,第1期

机译：从全文文章中结合自动表分类和关系提取在提取抗癌药物侧效对
2. Language-independent extractive automatic text summarization based on automatic keyword extraction [J] . Angel Hernandez-Castaneda, Rene Arnulfo Garcia-Hernandez, Yulia Ledeneva, Computer speech and language . 2022,第Jana期

机译：基于自动关键字提取的语言独立的提取自动文本摘要
3. Automatic normalization of short texts by combining statistical and rule-based techniques [J] . Marta R. Costa-jussa, Rafael E. Banchs Language Resources and Evaluation . 2013,第1期

机译：通过结合统计和基于规则的技术来自动规范短文本
4. Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization [C] . Itsumi Saito, Kyosuke Nishida, Kugatsu Sadamitsu, International joint conference on natural language processing . 2017

机译：自动提取用于日文归一化的变体归一化对
5. Techniques for automatic normalization of orthographically variant Yiddish texts. [D] . Blum, Yakov Peretz. 2015

机译：正交变体意第绪文本的自动归一化技术。
6. Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles [O] . Rong Xu, QuanQiu Wang -1

机译：结合自动表格分类和关系提取从全文文章中提取抗癌药物副作用对
7. Human gene name normalization using text matching with automatically extracted synonym dictionaries [O] . Kevin Murphy, Yang Jin, Jessica S. Kim, 2006

机译：人类基因名称标准化使用文本匹配与自动提取的同义词词典

Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization

摘要

著录项

相似文献

相关主题

期刊订阅