On the use of linear programming for unsupervised text classification

机译：关于使用线性规划进行无监督文本分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel, L1-norm based approach introduced by Kleinberg and Sandler [19]. We show that our algorithm performs extremely well on large datasets, with peak accuracy approaching that of supervised learning based on Support Vector Machines (SVMs) with large training sets. The method is based on the same idea that underlies Latent Semantic Indexing (LSI). We find a good low-dimensional subspace of a feature space and project all documents into it. However our projection minimizes different error, and unlike LSI we build a basis, that in many cases corresponds to the actual topics. We present results of testing of our algorithm on the abstracts of arXiv - an electronic repository of scientific papers, and the 20 Newsgroup dataset - a small snapshot of 20 specific newsgroups.

机译：我们提出了一种新的降维和无监督文本分类算法。我们使用混合模型作为生成语料库的基础过程，并利用由Kleinberg和Sandler提出的基于L1范式的新颖方法[19]。我们证明了我们的算法在大型数据集上的表现非常出色，其峰值精度接近具有大量训练集的基于支持向量机（SVM）的监督学习的峰值精度。该方法基于潜在语义索引（LSI）的相同思想。我们找到要素空间的一个良好的低维子空间，并将所有文档投影到其中。但是，我们的预测将不同的误差降到最低，并且与LSI不同，我们建立了一个基础，即在许多情况下对应于实际主题。我们在arXiv（一个科学论文的电子存储库，以及20个新闻组数据集）的摘要上展示了我们算法的测试结果，该摘要是20个特定新闻组的小快照。 展开▼

著录项

来源
《ACM SIGKDD international conference on Knowledge discovery in data mining》|2005年|P.256-264|共9页

会议地点

作者
Mark Sandler; PMark Sandler;
展开▼

作者单位

展开▼

会议组织

原文格式 PDF

正文语种

中图分类计算技术、计算机技术;

关键词
unsupervised learning;

机译：无监督学习;

相似文献

外文文献

中文文献

专利

1. Using unsupervised clustering approach to train the Support Vector Machine for text classification [J] . Shafiabady Niusha, Lee L. H., Rajkumar R., Neurocomputing . 2016,第octa26期

机译：使用无监督聚类方法训练支持向量机进行文本分类

2. Finding structure in noisy text: topic classification and unsupervised clustering [J] . Prem Natarajan, Rohit Prasad, Krishna Subramanian, International Journal on Document Analysis and Recognition . 2007,第3a4期

机译：在嘈杂的文本中查找结构：主题分类和无监督聚类

3. Nonlinear classification of hERG channel inhibitory activity by unsupervised classification method. [J] . Hidaka S, Yamasaki H, Ohmayu Y, The Journal of toxicological sciences . 2010,第3期

机译：通过无监督分类方法对hERG通道抑制活性进行非线性分类。

4. On the Use of Linear Programming for Unsupervised Text Classification [C] . Mark Sandier Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'05); 20050821-24; Chicago,IL(US) . 2005

机译：关于线性规划在无监督文本分类中的应用

5. Unsupervised classification of text documents. [D] . Aparicio Carrasco, Roxana K. 2008

机译：文本文件的无监督分类。

6. The articles.ELM resource: simplifying access to protein linear motif literature by annotation text-mining and classification [O] . N Palopoli, J A Iserte, L B Chemes, 2020

机译：ELM资源：通过注释文本挖掘和分类简化对蛋白质线性基序文献的访问

7. On the Use of Linear Programming for Unsupervised Text Classification [O] . Sandler Mark 2005

机译：关于线性规划在无监督文本分类中的应用

1. 基于线性规划的无监督文本分类 [J] . 袁玉虎 . 软件导刊 . 2012,第007期

2. 基于线性规划的无监督文本分类 [J] . 袁玉虎 . 软件导刊 . 2012,第007期

3. 一种基于k-最近邻的无监督文本分类算法 [J] . 余小鹏 ,马费成 . 情报学报 . 2008,第004期

4. 使用内容文本分类方法自动对存储在云数据管理系统内的大数据进行分类 [J] . 刘博斐1 ,雒琛2 . 电子技术与软件工程 . 2017,第020期

5. 使用Logistic回归模型进行中文文本分类 [J] . 李新福 ,赵蕾蕾 ,何海斌 . 计算机工程与应用 . 2009,第014期

6. 使用Rietveld全谱拟合对耐火材料进行无标定量分析(摘要) [C] . Wang Lin ,王林 ,Zhu Xiaodong . 帕纳科第13届用户X射线分析仪器技术交流会 . 2014

7. 使用句子嵌入的无监督提取文本摘要 [A] . AHMAD SHEHZAD . 2021

1. 无监督的文本分类方法、装置、电子设备及存储介质 [P] . 中国专利： CN113704479B . 2022.02.18

2. 基于词拓展无监督文本分类的文物安全知识图谱创建方法 [P] . 中国专利： CN114138979A . 2022-03-04

3. Text automatic classification device, text automatic classification program, and computer-readable recording medium recording the text automatic classification method and text automatic classification program [P] . 外国专利： JP4711556B2 . 2011-06-29

机译：文本自动分类装置，文本自动分类程序以及记录该文本自动分类方法和文本自动分类程序的计算机可读记录介质

4. Unsupervised removal of text from images using linear programming for optimal filter design [P] . 外国专利： US10657369B1 . 2020-05-19

机译：使用线性编程可无监督地从图像中去除文本，以实现最佳滤镜设计

5. TEXT CLASSIFICATION SYSTEM, TEXT CLASSIFICATION METHOD, AND TEXT CLASSIFICATION PROGRAM [P] . 外国专利： WO2015025978A1 . 2015-02-26

机译：文本分类系统，文本分类方法和文本分类程序

相关主题

On the use of linear programming for unsupervised text classification

摘要

著录项

相似文献

相关主题

期刊订阅