【24h】

XML Rules for Enclitic Segmentation

机译:用于气候分段的XML规则

获取原文
获取原文并翻译 | 示例

摘要

Sentence word segmentation is a very complex and important task in almost all natural language processing applications. Several works conceal or obviate the difficulties evolved in this process. In some cases, they adopt an easy partial solution acceptable for certain languages and applications, and, in others, they rely on a later or previous phase for solving it. However, there are hardly any papers with explanations describing how this later or previous phases have to be done.In this paper we have described these problems, focusing on part-of-speech tagging tasks, and propose a solution for one of them: the segmentation of verbal forms which contain enclitic pronouns. We have presented a generic verb processing system, which segments and pretags verbs which have enclitic pronouns joined to them.As we have seen, the system does not limit its function to segmentation, since it pretags the different linguistic components of a verbal form with enclitics, and removes invalid tags for its context. This innovative issue will be useful forpart-of-speech taggers, which can use this information to avoid making certain errors, thus improving its results.Although we have applied it to the Galician language, it can be easily adapted to other romance languages. The generic rule system we have designed allows rules to be written on the basis of XML files. This, combined with the use of lexicons, makes this adaptation simple and independent of the system internals.
机译:在几乎所有自然语言处理应用程序中,句子分词都是一项非常复杂且重要的任务。几件作品可以掩盖或消除这一过程中产生的困难。在某些情况下,他们采用了某些语言和应用程序可接受的简单的局部解决方案,而在另一些情况下,他们则依赖于上一个或下一个阶段来解决它。但是,几乎没有任何论文有说明如何描述此后期或之前的阶段必须完成的论文。在本文中,我们已经描述了这些问题,重点关注词性标记任务,并为其中一个提出了解决方案:包含环境代词的言语形式的分割。我们提供了一个通用的动词处理系统,该系统将加入了代词的动词进行分段和预标记动词,正如我们已经看到的那样,该系统不仅将其功能限制在分段上,因为它预先标记了带有言语的语言形式的不同语言成分,并为其上下文删除无效的标记。这个创新性的问题对于词性标注者而言非常有用,它可以利用该信息避免犯某些错误,从而改善其结果。尽管我们已将其应用于加利西亚语,但可以轻松地将其应用于其他浪漫语言。我们设计的通用规则系统允许基于XML文件编写规则。这与词典的使用相结合,使这种改编变得简单且独立于系统内部。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号