首页> 外文学位 >Computer-aided Semantic Signature Identification and Document Classification via Semantic Signatures.
【24h】

Computer-aided Semantic Signature Identification and Document Classification via Semantic Signatures.

机译:通过语义签名的计算机辅助语义签名识别和文档分类。

获取原文
获取原文并翻译 | 示例

摘要

In this era of textual data explosion on the World Wide Web, it may be very hard to find documents that are similar to the documents that are of interest to us. To overcome this problem we have developed a type of semantic signature that captures the semantics of target content (text). Semantic signatures from a text/document of interest are derived using the software package semantic signature mining tool (SSMinT). This software package has been developed as a part of this thesis work in collaboration with Sri Ramya Peddada. These semantic signatures are used to search and retrieve documents with similar semantic patterns. Effects of different representations of semantic signatures on the document classification outcomes are illustrated. Retrieved document classification accuracies of Euclidean and Spherical K-means clustering algorithms are compared. A Chi-square test is presented to prove that the observed and expected numbers of documents retrieved (from a corpus) are not significantly different. From this Chi-square test it is proved that the semantic signature concept is capable of retrieving documents of interest with high probability. Our findings indicate that this concept has potential for use in commercial text/document searching applications.
机译:在万维网上文本数据爆炸的这个时代,可能很难找到与我们感兴趣的文档相似的文档。为了克服这个问题,我们开发了一种语义签名,可以捕获目标内容(文本)的语义。使用软件包语义签名挖掘工具(SSMinT)可以从感兴趣的文本/文档中获取语义签名。该软件包是与Sri Ramya Peddada合作开发的,是本论文工作的一部分。这些语义签名用于搜索和检索具有相似语义模式的文档。说明了语义签名的不同表示形式对文档分类结果的影响。比较了欧几里得和球形K均值聚类算法的检索文档分类准确性。提出卡方检验以证明(从语料库中)检索到的文档的观察数和预期数没有显着差异。通过卡方检验,证明了语义签名概念能够以很高的概率检索感兴趣的文档。我们的发现表明,该概念在商业文本/文档搜索应用程序中具有潜在的用途。

著录项

  • 作者

    Para, Uday Kiran.;

  • 作者单位

    West Virginia University.;

  • 授予单位 West Virginia University.;
  • 学科 Computer Science.;Artificial Intelligence.
  • 学位 M.S.
  • 年度 2010
  • 页码 104 p.
  • 总页数 104
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号