【24h】

Enriching Arabic tweets representation based on web search engine and the rough set theory

机译:基于网络搜索引擎和粗糙集理论的阿拉伯文推文表示形式的丰富

获取原文

摘要

Twitter is a popular micro-blogging service where users search for timely and social information. Users post short text messages called Tweets, which are limited in length. These Tweets are different from traditional documents in its shortness and sparseness. As a result, short text tends to be ambiguous without enough contextual information. To address these issues, we propose an efficient method to enrich the tweet's representation for the Arabic language using web search engine as a large and open corpus and the Rough Set Theory which is a mathematical tool to deal with vagueness and uncertainty. To assess the performance of the proposed system, a series of experiments has been conducted. The effectiveness of our system has been evaluated and compared in terms of the Fl-measure using the Naive Bayesian (NB) and the Support Vector Machine (SVM) classifiers in our Arabic Tweets Categorization System. The obtained results show that enriching the tweet's representation increases significantly the Fl-measure of the Arabic tweets categorization system.
机译:Twitter是一种流行的微博客服务,用户可以在其中搜索及时的社交信息。用户发布称为Tweets的短文本消息,其长度受到限制。这些推文的简短性和稀疏性与传统文档不同。结果,短文本往往没有足够的上下文信息就变得模棱两可。为了解决这些问题,我们提出了一种有效的方法,可以使用网络搜索引擎作为大型开放语料,并使用粗糙集理论(这是一种处理模糊性和不确定性的数学工具)来丰富推文对阿拉伯语的表示。为了评估所提出系统的性能,进行了一系列实验。我们使用阿拉伯语推文分类系统中的朴素贝叶斯(NB)和支持向量机(SVM)分类器对Fl度量进行了评估并比较了我们系统的有效性。获得的结果表明,丰富推文的表示形式会显着提高阿拉伯推文分类系统的Fl度量。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号