Improving Automatic Text Document Clustering via Selecting a Small Amount of Labeled Data

机译：通过选择少量标记数据来改进自动文本文档聚类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We have investigated an approach which improves automatic text document clustering performance with the help of a small number of labeled documents. An active learning approach is proposed to select informative documents for obtaining user feedbacks on document labels. We make use of the intermediate cluster structure, which is discovered by the clustering process, to guide the active learning. Each cluster is represented by a language model. We make use of the uncertainty of document assignments as a clue for finding informative documents. We have conducted extensive experiments on several real-world corpora. The results demonstrate that our proposed framework is effective.

机译：我们研究了一种借助少量带标签的文档来改善自动文本文档聚类性能的方法。提出了一种主动学习方法来选择信息文档，以获得用户对文档标签的反馈。我们利用通过聚类过程发现的中间聚类结构来指导主动学习。每个群集由语言模型表示。我们利用文档分配的不确定性作为查找信息文档的线索。我们对几种真实世界的语料库进行了广泛的实验。结果表明，我们提出的框架是有效的。

著录项

来源
《The 9th international symposium on knowledge and systems sciences jointly with 4th Asia-Pacific international conference on knowledge management》|2008年|p.54-60|共7页
会议地点 Guangzhou(CN);Guangzhou(CN)
作者
Ruizhang Huang; Wai Lam;
展开▼
作者单位

Dept. of Industrial Systems Engineering, The Hong Kong Polytechnic University Hung Horn, Kowloon, Hong Kong, China;

Dept. of Systems Engineering Engineering Management, The Chinese University of Hong Kong Shatin, Hong Kong, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息与知识传播;
关键词
knowledge management; text mining; active learning; semi-supervised document clustering;

机译：知识管理;文本挖掘;主动学习;半监督文档聚类;

相似文献

外文文献
中文文献
专利

1. DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering [J] . Lakshmi R., Baskar S. Journal of Information Science . 2019,第6期

机译：DIC-DOC-K-means：使用K-means的DOCument聚类基于不相似性的初始质心选择，以提高文本文档聚类的效率
2. Improving hierarchical document cluster labels through candidate term selection [J] . Fabiano Fernandes dos Santos, Veronica Oliveira de Carvalho, Solange Oliveira Rezende Intelligent decision technologies . 2012,第1期

机译：通过候选词选择改善层次结构文档簇标签
3. Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering [J] . Abualigah Laith Mohammad, Khader Ahamad Tajudin, Al-Betar Mohammed Azmi, Expert Systems with Application . 2017,第octa期

机译：具有稳健的权重方案和文本文档聚类的动态尺寸缩减功能的文本特征选择
4. Improving Automatic Text Document Clustering via Selecting a Small Amount of Labeled Data [C] . Ruizhang Huang, Wai Lam International symposium on knowledge and systems sciences . 2008

机译：通过选择少量标记数据改进自动文本文档群集
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. SANAD: Single-label Arabic News Articles Dataset for automatic text categorization [O] . Omar Einea, Ashraf Elnagar, Ridhwan Al Debsi 2019

机译：SANAD：用于自动文本分类的单标签阿拉伯新闻文章数据集
7. Text Document Topical Recursive Clustering and Automatic Labeling of a Hierarchy of Document Clusters [O] . Xiaoxiao Li, Jiyang Chen, Osmar Zaiane 2013

机译：文本文档主题递归聚类和文档聚类层次结构的自动标记

Improving Automatic Text Document Clustering via Selecting a Small Amount of Labeled Data

摘要

著录项

相似文献

相关主题

期刊订阅