...
首页> 外文期刊>Journal of Theoretical and Applied Information Technology >TEXT INTERPRETATION USING A MODIFIED PROCESS OF THE ONTOLOGY AND SPARSE CLUSTERING
【24h】

TEXT INTERPRETATION USING A MODIFIED PROCESS OF THE ONTOLOGY AND SPARSE CLUSTERING

机译:使用本体和稀疏聚类的改进过程进行文本解释

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Many texts in online media consist of various information that need an appropriate way to extract and interpret them clearly. For better understanding of the content in the text collected from any online media, a proper methodology for the interpretation of useful information must be developed. This study offers a modified process of the text interpretation consisting of four stages with a preliminary stage of the text preprocessing and key phrase extraction using the annotated suffix tree (AST) technique and secondary stage of developing sparse clustering method named as iterative scaling of fuzzy additive spectral clustering (is-FADDIS) combined with a sharpening technique for grouping key phrases from the text. An ontology as the ?knowledge base? was developed combining with is-FADDIS method as the third stage. Interpretation from the input text was carried out as the final stage of the text interpretation. The performances of is-FADDIS clustering combined with sharpening technique as high as 96 and 78% were verified for some modeled sparse data and two specific real sparse data from two corpus, respectively, and could be better when comparing with Nonnegative Matrices Factorization (NMF) and K-means. The text interpretation of using the ontology gives a clear graph visualization on the relationship among key phrases even though it has a low correlation with content of the text. The result findings of this study potentially help us in ensuring an automatic process to be used for the interpretation of any topic information collected from online media.
机译:在线媒体中的许多文本包含各种信息,这些信息需要适当的方式来清楚地提取和解释它们。为了更好地理解从任何在线媒体收集的文本中的内容,必须开发一种解释有用信息的适当方法。这项研究提供了一个文本解释的修改过程,包括四个阶段,包括文本预处理和使用带注释后缀树(AST)技术的关键短语提取的初步阶段,以及开发稀疏聚类方法的第二阶段,称为模糊加法的迭代缩放。频谱聚类(is-FADDIS)与锐化技术相结合,可对文本中的关键短语进行分组。本体作为“知识基础”?第三阶段是结合is-FADDIS方法开发的。从输入文本进行的解释是文本解释的最后阶段。分别对来自两个语料库的一些模型化稀疏数据和两个特定的真实稀疏数据,验证了is-FADDIS聚类与锐化技术相结合的性能分别达到96%和78%,与非负矩阵因子分解(NMF)相比可能会更好和K-均值使用本体的文本解释可以清晰显示关键短语之间的关系,即使它与文本内容的相关性较低。这项研究的结果可能有助于我们确保使用自动过程来解释从在线媒体收集的任何主题信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号