首页> 外文学位 >Techniques for automatic normalization of orthographically variant Yiddish texts.
【24h】

Techniques for automatic normalization of orthographically variant Yiddish texts.

机译:正交变体意第绪文本的自动归一化技术。

获取原文
获取原文并翻译 | 示例

摘要

Yiddish is characterized by a multitude of orthographic systems. A number of approaches to automatic normalization of variant orthography have been explored for the processing of historic texts of languages whose orthography has since been standardized. However, these approaches have not yet been applied to Yiddish.;Using a manually normalized set of 16 Yiddish documents as a training and test corpus, four techniques for automatic normalization were compared: a hand-crafted set of transformation rules, an off-the-shelf spell checker, edit distance minimization with manually set weights, and edit distance minimization with weights learned through a training set.;Performance was evaluated by calculating the proportion of correctly normalized words in a test set, and by measuring precision and recall in a test of information retrieval.;For the given test corpus, normalization by minimization of edit distance with multi-character edit operations and learned weights was found to perform best in all tests.
机译:意第绪语的特点是有许多正交系统。已经研究了多种变体拼字法的自动归一化方法,以处理其拼字法已被标准化的语言的历史文本。但是,这些方法尚未应用于Yiddish 。;使用手动归一化的16个Yiddish文档集作为训练和测试语料库,比较了四种用于自动归一化的技术:一组手工制作的变换规则,一个非常规方法。 -货架拼写检查器,使用手动设置的权重编辑最小距离,并使用通过训练集学习的权重编辑最小距离。通过计算测试集中正确归一化的单词的比例,并通过测量精度和召回率来评估性能对于给定的测试语料,发现通过使用多字符编辑操作和学习权重来最小化编辑距离来进行归一化在所有测试中均表现最佳。

著录项

  • 作者

    Blum, Yakov Peretz.;

  • 作者单位

    City University of New York.;

  • 授予单位 City University of New York.;
  • 学科 Linguistics.;Information science.;Judaic studies.
  • 学位 M.A.
  • 年度 2015
  • 页码 49 p.
  • 总页数 49
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号