首页> 外文会议>9th International conference on language resources and evaluation >A Large Corpus of Product Reviews in Portuguese: Tackling Out-Of-Vocabulary Words
【24h】

A Large Corpus of Product Reviews in Portuguese: Tackling Out-Of-Vocabulary Words

机译:葡萄牙语的大量产品评论:解决词汇外的问题

获取原文

摘要

Web 2.0 has allowed a never imagined communication boom. With the widespread use of computational and mobile devices, anyone, in practically any language, may post comments in the web. As such, formal language is not necessarily used. In fact, in these communicative situations, language is marked by the absence of more complex syntactic structures and the presence of internet slang, with missing diacritics, repetitions of vowels, and the use of chat-speak style abbreviations, emoticons and colloquial expressions. Such language use poses severe new challenges for Natural Language Processing (NLP) tools and applications, which, so far, have focused on well-written texts. In this work, we report the construction of a large web corpus of product reviews in Brazilian Portuguese and the analysis of its lexical phenomena, which support the development of a lexical normalization tool for, in future work, subsidizing the use of standard NLP products for web opinion mining and summarization purposes.
机译:Web 2.0带来了前所未有的通信热潮。随着计算和移动设备的广泛使用,几乎任何语言的任何人都可以在Web上发布评论。因此,不一定要使用形式语言。实际上,在这些交际情况下,语言的特点是缺少更复杂的句法结构和互联网语,缺少音调符号,重复元音以及使用聊天语言风格的缩写,表情符号和口语表达。这种语言的使用给自然语言处理(NLP)工具和应用程序带来了严峻的新挑战,而自然语言处理工具和应用程序到目前为止,这些工具和应用程序都侧重于​​编写良好的文本。在这项工作中,我们报告了巴西葡萄牙语版本的大型产品评论Web语料库的构建及其词法现象的分析,这支持了词法归一化工具的开发,以便在将来的工作中为标准NLP产品的使用提供补贴。网络意见挖掘和汇总目的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号