Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Costa Gianni; Ortale Riccardo

首页> 外文期刊>Information retrieval >Machine learning techniques for XML (co-)clustering by structure-constrained phrases

【24h】

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

机译：通过结构受约束的短语进行XML（共）聚类的机器学习技术

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A new method is proposed for clustering XML documents by structure-constrained phrases. It is implemented by three machine-learning approaches previously unexplored in the XML domain, namely non-negative matrix (tri-)factorization, co-clustering and automatic transactional clustering. A novel class of XML features approximately captures structure-constrained phrases as n-grams contextualized by root-to-leaf paths. Experiments over real-world benchmark XML corpora show that the effectiveness of the three approaches improves with contextualized n-grams of suitable length. This confirms the validity of the devised method from multiple clustering perspectives. Two approaches overcome in effectiveness several state-of-the-art competitors. The scalability of the three approaches is investigated, too.

机译：提出了一种通过结构约束短语对XML文档进行聚类的新方法。它是通过XML领域以前未曾探索过的三种机器学习方法来实现的，即非负矩阵（tri）分解，共聚和自动事务聚类。一类新颖的XML功能可以将结构受限的短语近似地捕获为由根到叶路径上下文化的n-gram。在真实的基准XML语料库上进行的实验表明，这三种方法的有效性随着适当长度的上下文n-gram的提高而提高。这从多个聚类的角度证实了该方法的有效性。两种方法有效地克服了几个最先进的竞争对手。还研究了这三种方法的可伸缩性。

著录项

来源
《Information retrieval》 |2018年第1期|24-55|共32页
作者
Costa Gianni; Ortale Riccardo;
展开▼
作者单位

ICAR CNR, Via P Bucci 41c, Arcavacata Di Rende, CS, Italy;

ICAR CNR, Via P Bucci 41c, Arcavacata Di Rende, CS, Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
XML; Semi-structured data analysis; XML (co-)clustering by structure and nested text; Structure-constrained phrases; Contextualized n-grams;

机译：XML;半结构化数据分析;通过结构和嵌套文本进行XML（共）聚类;结构受限的短语;上下文化的n-grams;

相似文献

外文文献
中文文献
专利

1. Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation [J] . Ni Yizhao, Saunders Craig, Szedmak Sandor, Journal of machine learning research . 2011,第Jan期

机译：在机器翻译短语运动建模中使用机器学习技术
2. Deep learning-based techniques to enhance the precision of phrase-based statistical machine translation system for Indian languages [J] . J.P. Sanjanasri, M. Anand Kumar, K.P. Soman International Journal of Computer Aided Engineering and Technology . 2020,第1a2期

机译：基于深度学习的技术，以提高印度语言的短语统计机器翻译系统精度
3. Automating XML Markup using Machine Learning Techniques [J] . Shazia Akhtar, Ronan Reilly, John Dunnion Journal of Systemics, Cybernetics and Informatics . 2004,第5期

机译：使用机器学习技术自动化XML标记
4. Fully-Automatic XML Clustering by Structure-Constrained Phrases [C] . Gianni Costa, Riccardo Ortale International Conference on Tools with Artificial Intelligence . 2015

机译：结构约束短语的全自动XML群集
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. Performance improvement of machine learning techniques predicting the association of exacerbation of peak expiratory flow ratio with short term exposure level to indoor air quality using adult asthmatics clustered data [O] . Wan D. Bae, Sungroul Kim, Choon-Sik Park, 2021

机译：机器学习技术的性能改进预测峰值呼气流量的加剧与短期曝光率与室内空气质量的呼气流量的结缔组织使用成人哮喘集群数据
7. Exploitation of machine learning techniques in modelling phrase movements for machine translation [O] . Ni Yizhao, Saunders Craig, Szedmak Sandor, 2011

机译：在机器翻译的短语动作建模中利用机器学习技术

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

摘要

著录项

相似文献

相关主题

期刊订阅