Improving Authorship Attribution in Twitter Through Topic-Based Sampling

机译：通过基于主题的采样提高Twitter中的作者归属

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Aliases are used as a means of anonymity on the Internet in environments such as IRC (internet relay chat), forums and micro-blogging websites such as Twitter. While there are genuine reasons for the use of aliases, such as journalists operating in politically oppressive countries, they are increasingly being used by cybercriminals and extremist organisations. In recent years, we have seen increased research on authorship attribution of Twitter messages, including authorship analysis of aliases. Previous studies have shown that anti-aliasing of randomly generated sub-aliases yields high accuracies when linking the sub-aliases, but become much less accurate when topic-based sub-aliases are used. N-gram methods have previously been demonstrated to perform better than other methods in this situation. This paper investigates the effect of topic-based sampling on authorship attribution accuracy for the popular micro-blogging website Twitter. Features are extracted using character n-grams, which accurately capture differences in authorship style. These features are analysed using support vector machines using a one-versus-all classifier. The predictive performance of the algorithm is then evaluated using two different sampling methodologies - authors that were sampled through a context-sensitive topic-based search and authors that were sampled randomly. Topic-based sampling of authors is found to produce more accurate authorship predictions. This paper presents several theories as to why this might be the case.

机译：别名被用作Internet上的透露手称的手段，例如IRC（互联网中继聊天），论坛和微博网站（如Twitter）。虽然存在使用别名的原因，例如在政治上的压抑国家操作的记者，但越来越多地被网络犯罪分子和极端主义组织使用。近年来，我们已经看到了关于Twitter消息的作者归属的研究，包括别名的作者分析。以前的研究表明，当使用基于主题的子别名时，随机生成的子别名的抗锯齿会产生高精度，而是在使用基于主题的子别名时变得更加准确。先前已经证明了N-GRAM方法以比这种情况更好地表现优于其他方法。本文调查了基于主题的抽样对流行的微博博客网站推特的作者归因准确性的影响。使用字符n-gram提取功能，可以精确地捕获作者风格的差异。使用一个与所有分类器使用支持向量机进行分析这些功能。然后，使用两种不同的采样方法评估算法的预测性能 - 通过随机采样的基于上下文敏感的主题搜索和作者进行采样的作者来评估算法的预测性能。基于主题的作者采样被发现产生更准确的作者预测。本文提出了几种理论，为什么这可能是这种情况。

著录项

来源
《Australasian Joint Conference on Artiﬁcial Intelligence》|2017年|376p|共12页
会议地点
作者
Luoxi Pan; Iqbal Gondal; Robert Layton;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Authorship attribution; Twitter authorship; Linguistic analysis;

机译：作者归因;Twitter作者;语言分析;

相似文献

外文文献
中文文献
专利

1. Arabic Authorship Attribution: An Extensive Study on Twitter Posts [J] . Altakrori Malik H., Iqbal Farkhund, Fung Benjamin C. M., ACM transactions on Asian language information processing . 2019,第1期

机译：阿拉伯语作者身份归属：对Twitter帖子的广泛研究
2. Arabic authorship attribution: an extensive study on Twitter posts [J] . Xiannong Meng Computing reviews . 2019,第6期

机译：阿拉伯语作者身份归属：对Twitter帖子的广泛研究
3. Arabic authorship attribution: an extensive study on Twitter posts [J] . Xiannong Meng Computing reviews . 2019,第6期

机译：阿拉伯作者归属：关于Twitter Post的广泛研究
4. Improving Authorship Attribution in Twitter Through Topic-Based Sampling [C] . Luoxi Pan, Iqbal Gondal, Robert Layton Australasian joint conference on artificial intelligence . 2017

机译：通过基于主题的抽样改善Twitter中的作者身份归属
5. Stylometric Authorship Attribution Techniques and Analysis for Collaborative Platforms [D] . Dauber , Edwin George, Jr. 2020

机译：协作平台的款式作者归属技术与分析
6. Using Temporal Sampling to Improve Attribution of Source Populations for Invasive Species [O] . Sharyn J. Goldstien, Graeme J. Inglis, David R. Schiel, -1

机译：使用时间抽样来提高入侵物种的来源种群归因
7. Blogs, Twitter Feeds, and Reddit Comments: Cross-domain Authorship Attribution [O] . Overdorf Rebekah, Greenstadt Rachel 2016

机译：博客，推特供稿和Reddit评论：跨域作者身份归因

Improving Authorship Attribution in Twitter Through Topic-Based Sampling

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅