首页> 外文会议>International Conference on Information Communication and Embedded Systems >Document grouping with concept based discriminative analysis and feature partition

【24h】

Document grouping with concept based discriminative analysis and feature partition

机译：文档分组与基于概念的鉴别分析和特征分区

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is one of the most important techniques in machine learning and data mining responsibilities. Similar documents are grouped by performing clustering techniques. Similarity measure is used to determine transaction associations. Hierarchical clustering method produces tree structured results. Partition based clustering model produces the results in grid format. Text documents are formless data values with high dimensional attributes. Document clustering group the unlabeled text documents into meaningful clusters. Traditionally clustering methods need cluster count (K) before the document grouping process. Clustering accuracy decreases drastically with reference to the unsuitable cluster count. Document word features are automatically partitioned into two groups discriminative words and non-discriminative words. But only discriminative words are useful for grouping documents. The contribution of nondiscriminative words confuses the clustering process and leads to poor cluster solutions. The variational inference algorithm is used to infer the document collection structure and partition of document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition documents. DPM clustering model utilizes both the data likelihood and the clustering property of the Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to discover the latent cluster structure based on the DPM model. DPMFP clustering model is performed without requiring the no. of clusters as input. The Discriminative word identification process is enhanced with the labeled document analysis mechanism. The concept relationships are analyzed with Ontology support. Semantic weight analysis is used for the document similarity measure. This method increases the scalability with the support of labels and concept relations for dimensionality cutback process.

机译：聚类是机器学习和数据挖掘职责最重要的技术之一。通过执行聚类技术来分组类似的文档。相似度测量用于确定事务关联。分层群集方法生成树结构结果。基于分区的聚类模型以网格格式生成结果。文本文档是具有高维属性的无形数据值。文档群集将未标记的文本文本组分为有意义的集群。传统上群集方法需要在文档分组过程之前群集计数（k）。参考不合适的群集计数，聚类精度急剧下降。文档单词功能将自动分为两组识别单词和非歧视词。但只有鉴别性的单词对于分组文件很有用。非歧视性词的贡献使聚类过程困扰并导致群体解决方案不佳。变分推理算法用于在同一时间推断文档收集结构和文档单词的分区。 Dirichlet Process混合物（DPM）模型用于分区文档。 DPM群集模型利用Dirichlet进程（DP）的数据似然和群集属性。特征分区的Dirichlet Process混合模型（DPMFP）用于基于DPM模型发现潜在簇结构。 DPMFP聚类模型在不需要NO的情况下执行。群集作为输入。用标记的文档分析机制增强了鉴别的单词识别过程。通过本体支持分析概念关系。语义权重分析用于文档相似度测量。该方法增加了标签和概念关系的可扩展性，用于维度削减过程。

著录项

来源
《International Conference on Information Communication and Embedded Systems 》|2014年||共4页
会议地点
作者
Kajapriya S.; Vimal Shankar K.N.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术 ;
关键词
data mining; inference mechanisms; learning (artificial intelligence); mixture models; ontologies (artificial intelligence); text analysis; variational techniques; DPM clustering model; Dirichlet process mixture model for feature partition; clustering property; concept based discriminative analysis; data likelihood; data mining; dimensionality cutback process; discriminative word identification process; discriminative words; document analysis mechanism; document clustering accuracy; document word features; document word partition; hierarchical clustering method; latent cluster structure; machine learning; partition based clustering model; similarity measure; transaction associations; Clustering methods; Educational institutions; Feature extraction; Partitioning algorithms; Semantics; Text mining; Database management; Dirichlet Process Mixture Model; Document Clustering; Feature Partition; Text mining;

机译：数据挖掘;推理机制;学习（人工智能）;混合模型;本体（人工智能）;文本分析;变分技术;DPM聚类模型;特征分区的Dirichlet过程混合模型;集群属性;基于概念的歧视性分析;数据可能性;数据可能性;数据挖掘;维度削减过程;鉴别词识别过程;文档分析机制;文档分析机制;文件词特点;文件字分区;分层聚类方法;潜在的集群结构;机器学习;基于分区的聚类模型;相似度测量;相似度测量;交易协会;聚类方法;教育机构;特征提取;分区算法;语义;文本挖掘;数据库管理;Dirichlet过程混合模型;文档聚类;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;特征分区;文字挖掘;

相似文献

外文文献
中文文献
专利

1. Construction of query concepts based on feature clustering of documents [J] . Youjin Chang, Minkoo Kim, Vijay V. Raghavan Information retrieval . 2006 ,第3期

机译：基于文档特征聚类的查询概念构建
2. A CLASS DISCRIMINABILITY MEASURE BASED ON FEATURE SPACE PARTITIONING [J] . Kohn AF., Silva MOE., Nakano LGM. Pattern Recognition: The Journal of the Pattern Recognition Society . 1996 ,第5期

机译：基于特征空间划分的类可分辨性度量
3. Binarization of degraded document image based on feature space partitioning and classification [J] . Morteza Valizadeh, Ehsanollah Kabir International Journal on Document Analysis and Recognition . 2012 ,第1期

机译：基于特征空间划分和分类的退化文档图像二值化
4. Document grouping with concept based discriminative analysis and feature partition [C] . Kajapriya S., Vimal Shankar K.N. International Conference on Information Communication and Embedded Systems . 2014

机译：基于概念的判别分析和特征划分的文档分组
5. Plant protein localization based on frequent discriminative subsequences and partition-based subsequences [D] . Jazayeri, Seyed-Vahid 2008

机译：基于频繁区分子序列和基于分区的子序列的植物蛋白定位
6. Clinical map document based on XML (cMDX): document architecture with mapping feature for reporting and analysing prostate cancer in radical prostatectomy specimens [O] . Okyaz Eminaga, Reemt Hinkelammert, Axel Semjonow, 2010

机译：基于XML（cMDX）的临床地图文档：具有映射功能的文档体系结构用于报告和分析前列腺癌根治术标本中的前列腺癌
7. Machine Learning Approach to Document Classification using Concept based Features [O] . C. Saranya Jothi 2015

机译：基于概念特征的文档分类机器学习方法

Document grouping with concept based discriminative analysis and feature partition

摘要

著录项

相似文献

相关主题

期刊订阅