【24h】

Author Clustering Using SPATIUM

机译:使用SPATIUM进行作者聚类

获取原文
获取原文并翻译 | 示例

摘要

This paper presents the author clustering problem and compares it to related authorship attribution questions. The proposed model is based on a distance measure called Spatium derived from the Canberra measure (weighted version of L norm). The selected features consist of the 200 most frequent words and punctuation symbols. An evaluation methodology is presented and the test collections are extracted from the PAN CLEF 2016 evaluation campaign. In addition to those, we also consider two additional corpora reflecting the literature domain more closely. Based on four different languages, the evaluation measures demonstrate a high precision and F1 for all 20 test collections. A more detailed analysis provides reasons explaining some of the failures of the Spatium model.
机译:本文提出了作者聚类问题,并将其与相关的作者身份归属问题进行了比较。提出的模型基于堪培拉测度(L模的加权版本)衍生的距离测度Spatium。所选功能包括200个最常用的单词和标点符号。介绍了一种评估方法,并从PAN CLEF 2016评估活动中提取了测试集。除此之外,我们还考虑了另外两个语料库,它们更紧密地反映了文学领域。基于四种不同的语言,评估方法显示了所有20个测试集合的高精度和F1。更详细的分析提供了解释Spatium模型失败的原因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号