【24h】

Improving Automatic Text Document Clustering via Selecting a Small Amount of Labeled Data

机译:通过选择少量标记数据来改进自动文本文档聚类

获取原文
获取原文并翻译 | 示例

摘要

We have investigated an approach which improves automatic text document clustering performance with the help of a small number of labeled documents. An active learning approach is proposed to select informative documents for obtaining user feedbacks on document labels. We make use of the intermediate cluster structure, which is discovered by the clustering process, to guide the active learning. Each cluster is represented by a language model. We make use of the uncertainty of document assignments as a clue for finding informative documents. We have conducted extensive experiments on several real-world corpora. The results demonstrate that our proposed framework is effective.
机译:我们研究了一种借助少量带标签的文档来改善自动文本文档聚类性能的方法。提出了一种主动学习方法来选择信息文档,以获得用户对文档标签的反馈。我们利用通过聚类过程发现的中间聚类结构来指导主动学习。每个群集由语言模型表示。我们利用文档分配的不确定性作为查找信息文档的线索。我们对几种真实世界的语料库进行了广泛的实验。结果表明,我们提出的框架是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号