首页> 美国政府科技报告 >Authorship Discovery in Blogs Using Bayesian Classification with Corrective Scaling; Master's thesis
【24h】

Authorship Discovery in Blogs Using Bayesian Classification with Corrective Scaling; Master's thesis

机译:博弈中的作者发现使用贝叶斯分类和纠正缩放;硕士论文

获取原文

摘要

Widespread availability of free, public blog platforms has facilitated growth in the amount of individually written electronic text available online. Our research leverages an extremely large blog corpus for a study in authorship discovery, both to evaluate a traditional technique as applied to blogs, as well as to demonstrate the implications of authorship discovery in blogs for intelligence and forensic purposes. Our study uses a Bayesian classifier with two important extensions. First, we introduce a postclassification corrective scaling technique to mitigate the over- classification of many samples to a few authors. Second, we propose an n- percent-correct threshold metric, whereby we define a correct result as one where the true author is within some small subset of the original search space rather than requiring that he or she be the single most probable author. Using this technique, we are able to reduce a search space of 2000 authors to 1% of its original size with 91% accuracy when 1000 bigrams are present, or reduce the search space to 10% of its original size with 94% accuracy when only 500 bigrams are present.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号