首页> 外文期刊>The Open Cybernetics & Systemics Journal >Tibetan-Chinese Cross Language Text Similarity Calculation Based onLDA Topic Model
【24h】

Tibetan-Chinese Cross Language Text Similarity Calculation Based onLDA Topic Model

机译:基于LDA主题模型的藏汉跨语言文本相似度计算

获取原文
       

摘要

Topic model building is the basis and the most critical module of cross-language topic detection and tracking.Topic model also can be applied to cross-language text similarity calculation. It can improve the efficiency and the speedof calculation by reducing the texts’ dimensionality. In this paper, we use the LDA model in cross-language text similaritycomputation to obtain Tibetan-Chinese comparable corpora: (1) Extending Tibetan-Chinese dictionary by extractingTibetan-Chinese entities from Wikipedia. (2) Using topic model to make the texts mapped to the feature space of topics.(3) Calculating the similarity of two texts in different language according to the characteristics of the news text. Themethod for text similarity calculation based on LDA model reduces the dimensions of text space vector, and enhances theunderstanding of the text’s semantics. It also improves the speed and efficiency of calculation.
机译:主题模型的建立是跨语言主题检测与跟踪的基础和最关键的模块。主题模型也可以用于跨语言文本相似度的计算。通过减少文本的尺寸,可以提高效率和计算速度。本文中,我们在跨语言文本相似度计算中使用LDA模型来获得藏汉可比语料库:(1)通过从维基百科中提取藏汉实体来扩展藏汉词典。 (2)使用主题模型使文本映射到主题的特征空间。(3)根据新闻文本的特征,计算两​​种语言在不同语言中的相似度。基于LDA模型的文本相似度计算方法减少了文本空间矢量的维数,并增强了对文本语义的理解。它还提高了计算速度和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号