首页> 外文期刊>Journal of the American Society for Information Science and Technology >An Improved Algorithm for Unsupervised Decomposition of a Multi-Author Document
【24h】

An Improved Algorithm for Unsupervised Decomposition of a Multi-Author Document

机译:一种改进的多作者文档无监督分解算法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This article addresses the problem of unsupervised decomposition of a multi-author text document: identifying the sentences written by each author assuming the number of authors is unknown. An approach, BayesAD, is developed for solving this problem: apply a Bayesian segmentation algorithm, followed by a segment clustering algorithm. Results are presented from an empirical comparison between BayesAD and AK, a modified version of an approach published by Akiva and Koppel in 2013. BayesAD exhibited greater accuracy than AK in all experiments. However, BayesAD has a parameter that needs to be set and which had a nontrivial impact on accuracy. Developing an effective method for eliminating this need would be a fruitful direction for future work. When controlling for topic, the accuracy levels of BayesAD and AK were, in all but one case, worse than a baseline approach wherein one author was assumed to write all sentences in the input text document. Hence, room for improved solutions exists.
机译:本文解决了多作者文本文档的无监督分解问题:假设作者数量未知,确定每个作者撰写的句子。为解决此问题,开发了一种方法BayesAD:应用贝叶斯分割算法,然后再应用分段聚类算法。结果来自BayesAD和AK之间的经验比较,这是Akiva和Koppel在2013年发布的方法的改进版本。在所有实验中,BayesAD的准确性都比AK高。但是,BayesAD具有需要设置的参数,并且对精度没有重大影响。开发一种消除这种需求的有效方法将是未来工作的一个富有成果的方向。在控制主题时,除一种情况外,在所有情况下,BayesAD和AK的准确度均比基准方法差,在基准方法中,假定一位作者将所有句子写在输入文本文档中。因此,存在改进解决方案的空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号