【24h】

Verb Clustering for Brazilian Portuguese

机译:巴西葡萄牙语的动词聚类

获取原文

摘要

Levin-style classes which capture the shared syntax and semantics of verbs have proven useful for many Natural Language Processing (NLP) tasks and applications. However, lexical resources which provide information about such classes are only available for a handful of worlds languages. Because manual development of such resources is extremely time consuming and cannot reliably capture domain variation in classification, methods for automatic induction of verb classes from texts have gained popularity. However, to date such methods have been applied to English and a handful of other, mainly resource-rich languages. In this paper, we apply the methods to Brazilian Portuguese - a language for which no VerbNet or automatic class induction work exists yet. Since Levinstyle classification is said to have a strong cross-linguistic component, we use unsupervised clustering techniques similar to those developed for English without language-specific feature engineering. This yields interesting results which line up well with those obtained for other languages, demonstrating the crosslinguistic nature of this type of classification. However, we also discover and discuss issues which require specific consideration when aiming to optimise the performance of verb clustering for Brazilian Portuguese and other less-resourced languages.
机译:Levin-Sique类捕获共享语法和动词的语义已经证明是有用的许多自然语言处理(NLP)任务和应用程序。但是,提供有关此类类别信息的词汇资源仅适用于少数世界语言。由于手动开发此类资源非常耗时,并且不能可靠地捕获分类中的域变化,因此自动诱导文本的动词类的方法已经获得了普及。但是,迄今为止,这些方法已应用于英语和少数其他,主要是资源丰富的语言。在本文中,我们将该方法应用于巴西葡萄牙语 - 一种没有动词或自动类的感应工作的语言。由于据说LevinStyle分类具有强大的跨语言组成部分,我们使用与没有语言特定的特征工程的英语开发的无监督聚类技术类似。这产生了与其他语言获得的那些有关的有趣结果,展示了这种分类的奇妙性质。然而,我们还发现并讨论了在旨在优化巴西葡萄牙和其他少资源语言的动词聚类的表现时需要具体考虑的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号