首页> 外文学位 >Semantic preserving text representation and its applications in text clustering.
【24h】

Semantic preserving text representation and its applications in text clustering.

机译:语义保留文本表示及其在文本聚类中的应用。

获取原文
获取原文并翻译 | 示例

摘要

Text mining using the vector space representation has proven to be an valuable tool for classification, prediction, information retrieval and extraction. The nature of text data presents several issues to these tasks, including large dimension and the existence of special polysemous and synonymous words. A variety of techniques have been devised to overcome these shortcomings, including feature selection and word sense disambiguation. Privacy preserving data mining is also an area of emerging interest. Existing techniques for privacy preserving data mining require the use of secure computation protocols, which often incur a greatly increased computational cost. In this paper, a generalization-based method is presented for creating a semantic-preserving vector space which reduces dimension as well as addresses problems with special word types. The SPVSM also allows private text data to be safely represented without degrading cluster accuracy or performance. Further, the result produced is also usable in combination with theoretic based techniques such as latent semantic indexing. The performance of text clustering using the semantic preserving generalization method is evaluated and compared to existing feature selection techniques, and shown to have significant merit from a clustering perspective.
机译:使用矢量空间表示的文本挖掘已被证明是用于分类,预测,信息检索和提取的有价值的工具。文本数据的性质给这些任务带来了几个问题,包括大尺寸和特殊的多义词和同义词的存在。已经设计出多种技术来克服这些缺点,包括特征选择和词义消歧。隐私保护数据挖掘也是一个新兴的领域。用于保护隐私的数据挖掘的现有技术需要使用安全的计算协议,这通常会导致大大增加的计算成本。在本文中,提出了一种基于泛化的方法来创建保留语义的向量空间,该向量空间可减小维数并解决特殊单词类型的问题。 SPVSM还允许在不降低群集准确性或性能的情况下安全地表示私人文本数据。此外,产生的结果还可与基于理论的技术(例如潜在语义索引)结合使用。使用语义保留概括方法对文本聚类的性能进行了评估,并与现有的特征选择技术进行了比较,从聚类的角度来看,它具有显着的优点。

著录项

  • 作者

    Howard, Michael.;

  • 作者单位

    Missouri University of Science and Technology.;

  • 授予单位 Missouri University of Science and Technology.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2012
  • 页码 48 p.
  • 总页数 48
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:42:48

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号