首页> 外文会议>IEEE Global Communications Conference >Attributing Authors of Emirati Tweets
【24h】

Attributing Authors of Emirati Tweets

机译:阿联酋推文的特约作者

获取原文

摘要

Electronic text Author Attribution (AA) is a well known stylometry problem that attempts to infer the identity of authors of disputed electronic texts by solely analyzing the texts. This is important for various applications such as forensics and market analysis. However, currently the state of the art in author identification has never been evaluated against Emirati social media electronic texts. This is partly due to the fact that no evaluation dataset exists that is suitable for evaluating author identification methods in the domain of Emirati social media electronic texts. This paper presents the first of such evaluations, along with the release of the Khonji-Iraqi Emirati Tweets author identification evaluation dataset with 30 authors (KIT30). Additionally, novel definitions of grams are introduced, namely compound grams, which demonstrate that decision models that make use of them can achieve higher classification accuracies than the alternative case when classical definitions of grams are followed. The findings also indicate that, when suitable data representation is used, the degradation in the classification accuracy, as the space of suspect authors increases, is not necessarily as sharp as previously reported in the literature. This suggests that AA problem solvers can be significantly more scalable as previously evaluated.
机译:电子文本作者归因(AA)是一个众所周知的笔法问题,它试图通过仅分析文本来推断有争议的电子文本作者的身份。这对于取证和市场分析等各种应用非常重要。但是,目前尚未根据阿联酋社交媒体电子文本评估作者身份的最新技术。部分原因是由于没有评估数据集可用于评估阿联酋社交媒体电子文本领域中的作者识别方法。本文介绍了此类评估中的第一个,以及30名作者发布的Khonji-Iraqi Emirati Tweets作者识别评估数据集。另外,引入了克的新定义,即复合克,这表明与遵循经典克定义的替代情况相比,使用它们的决策模型可以实现更高的分类精度。研究结果还表明,当使用合适的数据表示形式时,随着可疑作者空间的增加,分类准确性的降低不一定像以前文献中所报道的那样严重。这表明,AA问题解决者可以像以前评估的那样具有更大的可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号