【24h】

Context Tailoring for Text Normalization

机译:文本标准化的上下文定制

获取原文
获取原文并翻译 | 示例

摘要

Language processing tools suffer from significant performance drops in social media domain due to its continuously evolving language. Transforming non-standard words into their standard forms has been studied as a step towards proper processing of ill-formed texts. This work describes a normalization system that considers contextual and lexical similarities between standard and non-standard words for removing noise in texts. A bipartite graph that represents contexts shared by words in a large unlabeled text corpus is utilized for exploring normalization candidates via random walks. Input context of a non-standard word in a given sentence is tailored in cases where a direct match to shared contexts is not possible. The performance of the system was evaluated on Turkish social media texts.
机译:语言处理工具由于其不断发展的语言而在社交媒体领域中的性能显着下降。已经研究了将非标准单词转换为标准格式的步骤,以作为正确处理格式错误的文本的步骤。这项工作描述了一种标准化系统,该系统考虑了标准和非标准词之间的上下文和词汇相似性,以消除文本中的噪音。表示在大型未标记文本语料库中单词共享的上下文的二部图用于通过随机游走探索归一化候选对象。在无法直接匹配共享上下文的情况下,将调整给定句子中非标准单词的输入上下文。该系统的性能在土耳其社交媒体文本上进行了评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号