首页> 外文会议>International Conference Applications of Mathematics in Engineering and Economics >Assessing semantic similarity of texts - Methods and algorithms
【24h】

Assessing semantic similarity of texts - Methods and algorithms

机译:评估文本的语义相似性 - 方法和算法

获取原文

摘要

Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.
机译:评估文本的语义相似性是不同文本相关应用程序的重要组成部分,如教育系统,信息检索,文本摘要等。此任务是由复杂分析执行的,实现文本挖掘技术。文本挖掘涉及几个预处理步骤,该步骤可通过提取和选择其内容来提供在语料库中获取文档的结构化代表模型。通常,该模型是基于矢量的,并且可以通过知识发现方法进一步分析。算法和措施用于评估语法和语义水平的文本。一个重要的文本挖掘方法和相似度测量是潜在语义分析(LSA)。它提供了减少文档矢量空间的维度,并更好地捕获文本语义。研究了LSA的数学背景,用于通过探索它们的共同发生来导出给定文本中的单词的含义。呈现了在减少的多维空间中获得单词的矢量表示及其对应潜在概念的算法及其相似性计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号