首页> 外文期刊>Computational linguistics >Identifying Semitic roots: Machine learning with linguistic constraints
【24h】

Identifying Semitic roots: Machine learning with linguistic constraints

机译:识别符号族的根源:具有语言限制的机器学习

获取原文
获取原文并翻译 | 示例
           

摘要

Words in Semitic languages are formed by combining two morphemes: a root and a pattern. The root consists of consonants only, by default three, and the pattern is a combination of vowels and consonants, with non-consecutive "slots" into which the root consonants are inserted. Identifying the root of a given word is an important task, considered to be an essential part of the morphological analysis of Semitic languages, and information on roots is important for linguistics research as well as for practical applications. We present a machine learning approach, augmented by limited linguistic knowledge, to the problem of identifying the roots of Semitic words. Although programs exist which can extract the root of words in Arabic and Hebrew, they are all dependent on labor-intensive construction of large-scale lexicons which are components of full-scale morphological analyzers. The advantage of our method is an automation of this process, avoiding the bottleneck of having to laboriously list the root and pattern of each lexeme in the language. To the best of our knowledge, this is the first application of machine learning to this problem, and one of the few attempts to directly address non-concatenative morphology using machine learning. More generally, our results shed light on the problem of combining classifiers under (linguistically motivated) constraints.
机译:闪族语言中的单词是通过组合两个词素形成的:词根和模式。根仅由辅音组成,默认情况下为三个,并且该模式是元音和辅音的组合,具有插入根辅音的非连续“槽”。识别给定单词的词根是一项重要的任务,被认为是闪族语言形态分析的重要组成部分,关于词根的信息对于语言学研究和实际应用都很重要。我们提出了一种机器学习方法,并通过有限的语言知识加以补充,以解决识别闪字词根源的问题。尽管存在可以提取阿拉伯语和希伯来语单词根源的程序,但它们都依赖于大型词典的劳动密集型构造,而大型词典是全面形态分析器的组成部分。我们方法的优点是该过程的自动化,避免了必须费力地列出语言中每个词素的根和模式的瓶颈。据我们所知,这是机器学习在此问题上的首次应用,也是使用机器学习直接解决非连接形态的少数尝试之一。更笼统地说,我们的结果揭示了在(出于语言动机)约束下组合分类器的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号