首页> 外文期刊>Current Organic Synthesis >Issues of Dialectal Saudi Twitter Corpus
【24h】

Issues of Dialectal Saudi Twitter Corpus

机译:方言沙特推特语料库的问题

获取原文
获取原文并翻译 | 示例
           

摘要

Text mining research relies heavily on the availability of a suitable corpus. This paper presents a dialectal Saudi corpus that contains 207452 tweets generated by Saudi Twitter users. In addition, a comparison between the Saudi tweets dataset, Egyptian Twitter corpus and Arabic top news raw corpus (representing Modern Standard Arabic (MSA) in various aspects, such as the differences between formal and colloquial texts was carried out. Moreover, investigation into the issues and phenomena, such as shortening, concatenation, colloquial language, compounding, foreign language, spelling errors and neologisms on this type of dataset was performed.
机译:文本挖掘研究严重依赖于合适的语料库的可用性。 本文介绍了Saudial Saudi语料库,包含沙特推特用户生成的207452次推文。 此外,沙特推文数据集,埃及Twitter语料库和阿拉伯顶级新闻原语的比较(代表了各个方面的现代标准阿拉伯语(MSA),例如正式和口语文本之间的差异。此外,调查 缩短,倾斜,口语语言,复合,外语,拼写错误和新语言的问题和现象是在这种类型的数据集上进行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号