首页> 外文期刊>Facta Universitatis. Series Mathematics and Informatics >THE INFLUENCE OF TEXT PREPROCESSING METHODS AND TOOLS ON CALCULATING TEXT SIMILARITY
【24h】

THE INFLUENCE OF TEXT PREPROCESSING METHODS AND TOOLS ON CALCULATING TEXT SIMILARITY

机译:文本预处理方法和工具对计算文本相似性的影响

获取原文
           

摘要

Text mining to a great extent depends on the various text preprocessing techniques. The preprocessing methods and tools which are used to prepare texts for further mining can be divided into those which are and those which are not language-dependent. The subject matter of this research was the analysis of the in?uence of these methods and tools on further text mining. We ?rst focused on the analysis of the in?uence on the reduction of the vector space model for the multidimensional represen-tation of text documents. We then analyzed the in?uence on calculating text similarity, which is the focus of this research. The conclusion we reached is that the implemen-tation of various text preprocessing methods in the Serbian language, which are used for the reduction of the vector space model for the multidimensional representation of text document, achieves the required results. But, the implementation of various text preprocessing methods speci?c to the Serbian language for the purpose of calculating text similarity can lead to great di?erences in the results.
机译:在很大程度上进行文本挖掘取决于各种文本预处理技术。用于准备进一步采矿的预处理的预处理方法和工具可以分为那些并且那些不依赖的那些。该研究的主题是对进一步挖掘的这些方法和工具的in?我们重点关注对in的分析 - 对文本文档的多维代表的传染媒介空间模型的减少。然后,我们分析了计算文本相似性的?这是本研究的重点。我们达到的结论是,塞尔维亚语言中的各种文本预处理方法的植入性,用于减少文本文档的多维表达的矢量空间模型,实现了所需的结果。但是,各种文本预处理方法的实施指定到塞尔维亚语言以计算文本相似度的目的可以导致大迪?渗透结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号