首页> 外文会议>International Conference on Business and Industrial Research >Probabilistic learning models for topic extraction i Thai language
【24h】

Probabilistic learning models for topic extraction i Thai language

机译:主题提取的概率学习模型I泰语语言

获取原文

摘要

Natural language processing (NLP) in Thai language is notoriously complicated. One major problem is the lack of word boundary in a sentence, introducing ambiguity in word tokenization. For topic extraction, semantic ambiguity adds another layer of complexity to the problem. Topic model that disregards word order, such as Latent Dirichlet Allocation (LDA), performs poorly in Thai Language. In this paper, we experimented and tested a probabilistic language model equipped with word location information, the so-called Topic N-grams model (TNG). We deployed several testing tasks to assess TNG's capabilities of modeling the generative process of Thai text and established benchmarks that compare the performance of LDA and TNG for various NLP tasks in Thai language. To our knowledge, this paper is the first to explore word-order model in Thai language topic extraction. We concluded that TNG can help boosting performance of Thai language processing in word cutting, semantic checking, word prediction, and document generation task. We also explored how we can measure performance of LDA and TNG on such tasks using perplexity.
机译:泰语语言的自然语言处理(NLP)是众所周知的复杂性。一个主要问题是句子中缺乏词汇边界,引入单词标记中的模糊性。对于主题提取,语义歧义为问题添加了另一层复杂性。忽略Word顺序的主题模型,例如潜在的Dirichlet分配(LDA),以泰语语言表现不佳。在本文中,我们尝试并测试了配备有单词位置信息的概率语言模型,所谓的主题n-grams模型(TNG)。我们部署了多种测试任务,以评估TNG模拟泰语文本的生成过程的功能,并建立了比较泰语语言中各种NLP任务的LDA和TNG性能的基准。据我们所知,本文是第一个探讨泰语语言主题提取的单词阶模型。我们得出的结论是,TNG可以帮助提高泰语语言处理的表现,单词切割,语义检查,词预测和文档生成任务。我们还探讨了我们如何使用困惑地衡量LDA和TNG的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号