首页> 外文期刊>Information retrieval >Machine learning techniques for XML (co-)clustering by structure-constrained phrases
【24h】

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

机译:通过结构受约束的短语进行XML(共)聚类的机器学习技术

获取原文
获取原文并翻译 | 示例
           

摘要

A new method is proposed for clustering XML documents by structure-constrained phrases. It is implemented by three machine-learning approaches previously unexplored in the XML domain, namely non-negative matrix (tri-)factorization, co-clustering and automatic transactional clustering. A novel class of XML features approximately captures structure-constrained phrases as n-grams contextualized by root-to-leaf paths. Experiments over real-world benchmark XML corpora show that the effectiveness of the three approaches improves with contextualized n-grams of suitable length. This confirms the validity of the devised method from multiple clustering perspectives. Two approaches overcome in effectiveness several state-of-the-art competitors. The scalability of the three approaches is investigated, too.
机译:提出了一种通过结构约束短语对XML文档进行聚类的新方法。它是通过XML领域以前未曾探索过的三种机器学习方法来实现的,即非负矩阵(tri)分解,共聚和自动事务聚类。一类新颖的XML功能可以将结构受限的短语近似地捕获为由根到叶路径上下文化的n-gram。在真实的基准XML语料库上进行的实验表明,这三种方法的有效性随着适当长度的上下文n-gram的提高而提高。这从多个聚类的角度证实了该方法的有效性。两种方法有效地克服了几个最先进的竞争对手。还研究了这三种方法的可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号