Constrained Text Coclustering with Supervised and Unsupervised Constraints

Song Yangqiu; Pan Shimei; Liu Shixia; Wei Furu; Zhou Michelle X.; Qian Weihong

首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Constrained Text Coclustering with Supervised and Unsupervised Constraints

【24h】

Constrained Text Coclustering with Supervised and Unsupervised Constraints

机译：有监督和无监督约束的约束文本聚类

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we propose a novel constrained coclustering method to achieve two goals. First, we combine information-theoretic coclustering and constrained clustering to improve clustering performance. Second, we adopt both supervised and unsupervised constraints to demonstrate the effectiveness of our algorithm. The unsupervised constraints are automatically derived from existing knowledge sources, thus saving the effort and cost of using manually labeled constraints. To achieve our first goal, we develop a two-sided hidden Markov random field (HMRF) model to represent both document and word constraints. We then use an alternating expectation maximization (EM) algorithm to optimize the model. We also propose two novel methods to automatically construct and incorporate document and word constraints to support unsupervised constrained clustering: 1) automatically construct document constraints based on overlapping named entities (NE) extracted by an NE extractor; 2) automatically construct word constraints based on their semantic distance inferred from WordNet. The results of our evaluation over two benchmark data sets demonstrate the superiority of our approaches against a number of existing approaches.

机译：在本文中，我们提出了一种新颖的约束共聚方法来实现两个目标。首先，我们结合信息理论的聚类和约束聚类来提高聚类性能。其次，我们采用有监督约束和无监督约束来证明我们算法的有效性。无监督的约束条件是从现有知识源中自动得出的，从而节省了使用手动标记的约束条件的工作量和成本。为了实现我们的第一个目标，我们开发了一个双面隐马尔可夫随机场（HMRF）模型来表示文档和单词约束。然后，我们使用交替期望最大化（EM）算法来优化模型。我们还提出了两种新颖的方法来自动构造和合并文档和单词约束，以支持无监督约束聚类：1）基于NE提取器提取的重叠命名实体（NE）自动构造文档约束； 2）根据从WordNet推断出的语义距离自动构造单词约束。我们对两个基准数据集的评估结果证明了我们的方法相对于许多现有方法的优越性。

著录项

来源
《Knowledge and Data Engineering, IEEE Transactions on》 |2013年第6期|1227-1239|共13页
作者
Song Yangqiu; Pan Shimei; Liu Shixia; Wei Furu; Zhou Michelle X.; Qian Weihong;
展开▼
作者单位

Microsoft Research Asia, Beijing;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering algorithms; Clustering methods; Computational modeling; Hidden Markov models; Humans; Semantics; Sparse matrices; Constrained clustering; coclustering; text clustering; unsupervised constraints;

机译：聚类算法;聚类方法;计算建模;隐藏的马尔可夫模型;人类;语义学稀疏矩阵;约束聚类;集群文本聚类;无监督约束;

相似文献

外文文献
中文文献
专利

1. Correlative study and analysis for hidden patterns in text analytics unstructured data using supervised and unsupervised learning techniques [J] . E. Laxmi Lydia, S. Kannan, S. SumanRajest, International Journal of Cloud Computing . 2020,第2a3期

机译：文本分析非结构化数据中隐藏模式的相关研究与分析，使用监督和无监督学习技术
2. Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives [J] . Rohan Nanda, Giovanni Siragusa, Luigi Di Caro, Artificial Intelligence and Law . 2019,第2期

机译：无监督和受监督的文本相似性系统，用于自动识别欧洲指令的国家实施措施
3. Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives [J] . Rohan Nanda, Giovanni Siragusa, Luigi Di Caro, Artificial Intelligence and Law . 2019,第2期

机译：欧洲指令全国实施措施的自动识别无监督和监督文本相似性系统
4. Seed Word Selection for Weakly-Supervised Text Classification with Unsupervised Error Estimation [C] . Yiping Jin, Akshay Bhatia, Dittaya Wanvarie Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2021

机译：无监督误差估计的弱监督文本分类种子选择
5. Automatic Detection of Section Title and Prose Text in HTML Documents Using Unsupervised and Supervised Learning [D] . Mysore Gopinath, Abhijith Athreya 2018

机译：使用无监督和有监督的学习自动检测HTML文档中的节标题和散文
6. Exploring supervised and unsupervised methods to detect topics in biomedical text [O] . Minsuk Lee, Weiqing Wang, Hong Yu 2006

机译：探索有监督和无监督方法以检测生物医学文本中的主题
7. Supervised and Unsupervised Neural Approaches to Text Readability [O] . Matej Martinc, Senja Pollak, Marko Robnik-Šikonja 2021

机译：监督和无监督的神经方法对文本可读性

Constrained Text Coclustering with Supervised and Unsupervised Constraints

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅