首页> 外文期刊>International journal of computer processing of languages >Structural Parsing of Natural Language Text in Tamil Language Using Dependency Model
【24h】

Structural Parsing of Natural Language Text in Tamil Language Using Dependency Model

机译:依赖模型在泰米尔语中自然语言文本的结构解析

获取原文
获取原文并翻译 | 示例

摘要

Parsing is an important process of Natural Language Processing (NLP) and Computational Linguistics which is used to understand the syntax and semantics of natural language sentences confined to the grammar. Parsing models need syntax and semantic coverage for better interpretation of natural language sentences. Though statistical parsing with trigram language models gives better performance through tri-gram probabilities and large vocabulary size, it has some disadvantages like lack of support in syntax, free ordering of words and long distance relationship which are the challenging features of the Tamil language. Grammar based structural parsing provides solutions to some extent. To overcome these disadvantages, structural component is to be involved in statistical approach which results in hybrid models like phrase and dependency models. To add the structural component, balance the vocabulary size and meet the challenging features, lexicalized and statistical parsing (LSP) is to be employed with the assistance of hybrid models. To incorporate all the features in complex and large sentences, phrase structure model may not be suitable to a larger extent. When dependency relations are applied among words, direct relationships can be established. Lexicalized and statistical parsing of natural language text in Tamil language using dependency model will give better performance than using phrase structure model. New part of speech (POS) and dependency tag sets for Tamil language have beenrnTreebank has been developed with 326 sentences which comprises more than 5000 words with manual annotation. It has been extended to 1000 sentences using bootstrapping and manual correction and used to train the dependency model. This LSP with dependency model provides better results and covers all the features of Tamil language.
机译:解析是自然语言处理(NLP)和计算语言学的重要过程,用于理解限于语法的自然语言句子的语法和语义。解析模型需要语法和语义覆盖,以便更好地解释自然语言句子。尽管通过三字母组语言模型进行统计分析可以通过三字母组概率和较大的词汇量提供更好的性能,但它也具有一些缺点,例如缺乏对语法的支持,单词的自由排序和长距离关系,这些都是泰米尔语的挑战性特征。基于语法的结构解析在某种程度上提供了解决方案。为了克服这些缺点,统计方法将涉及结构组件,这将导致生成混合模型,例如短语和依存关系模型。为了增加结构成分,平衡词汇量并满足挑战性特征,将在混合模型的帮助下使用词汇化和统计解析(LSP)。为了将所有特征都包含在复杂的大型句子中,短语结构模型可能在较大程度上不适合。当在单词之间应用依赖关系时,可以建立直接关系。使用依赖项模型对泰米尔语中的自然语言文本进行词汇化和统计分析将比使用短语结构模型提供更好的性能。泰米尔语的新词性(POS)和依赖项标签集已被开发。Treebank已开发了326个句子,其中包括5000多个带有人工注释的单词。使用自举和手动更正已将其扩展到1000个句子,并用于训练依赖关系模型。这种具有依赖性模型的LSP提供了更好的结果,并涵盖了泰米尔语的所有功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号