Probabilistic learning models for topic extraction i Thai language

机译：主题提取的概率学习模型I泰语语言

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Natural language processing (NLP) in Thai language is notoriously complicated. One major problem is the lack of word boundary in a sentence, introducing ambiguity in word tokenization. For topic extraction, semantic ambiguity adds another layer of complexity to the problem. Topic model that disregards word order, such as Latent Dirichlet Allocation (LDA), performs poorly in Thai Language. In this paper, we experimented and tested a probabilistic language model equipped with word location information, the so-called Topic N-grams model (TNG). We deployed several testing tasks to assess TNG's capabilities of modeling the generative process of Thai text and established benchmarks that compare the performance of LDA and TNG for various NLP tasks in Thai language. To our knowledge, this paper is the first to explore word-order model in Thai language topic extraction. We concluded that TNG can help boosting performance of Thai language processing in word cutting, semantic checking, word prediction, and document generation task. We also explored how we can measure performance of LDA and TNG on such tasks using perplexity.

机译：泰语语言的自然语言处理（NLP）是众所周知的复杂性。一个主要问题是句子中缺乏词汇边界，引入单词标记中的模糊性。对于主题提取，语义歧义为问题添加了另一层复杂性。忽略Word顺序的主题模型，例如潜在的Dirichlet分配（LDA），以泰语语言表现不佳。在本文中，我们尝试并测试了配备有单词位置信息的概率语言模型，所谓的主题n-grams模型（TNG）。我们部署了多种测试任务，以评估TNG模拟泰语文本的生成过程的功能，并建立了比较泰语语言中各种NLP任务的LDA和TNG性能的基准。据我们所知，本文是第一个探讨泰语语言主题提取的单词阶模型。我们得出的结论是，TNG可以帮助提高泰语语言处理的表现，单词切割，语义检查，词预测和文档生成任务。我们还探讨了我们如何使用困惑地衡量LDA和TNG的性能。

著录项

来源
《International Conference on Business and Industrial Research》|2018年|670p|共6页
会议地点
作者
Chulayuth Asawaroengchai; Warasinee Chaisangmongkon; Djitt Laowattana;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP39-53;
关键词
Task analysis; Semantics; Probability distribution; Benchmark testing; Dictionaries; Random access memory;

机译：任务分析;语义;概率分布;基准测试;词典;随机存取存储器;

相似文献

外文文献
中文文献
专利

1. A Hybrid Language Model Based on a Recurrent Neural Network and Probabilistic Topic Modeling [J] . M. S. Kudinov, A. A. Romanenko Pattern recognition and image analysis: advances in mathematical theory and applications in the USSR . 2016,第3期

机译：基于递归神经网络和概率主题建模的混合语言模型
2. Plant Phenotyping using Probabilistic Topic Models: Uncovering the Hyperspectral Language of Plants [J] . Mirwaes Wahabzada, Anne-Katrin Mahlein, Christian Bauckhage, Scientific reports. . 2016,第1期

机译：使用概率主题模型植物表型：揭示植物的高光谱语言
3. Probabilistic Topic Models for Learning Terminological Ontologies [J] . Knowledge and Data Engineering, IEEE Transactions on . 2010,第7期

机译：学习术语本体的概率主题模型
4. Probabilistic learning models for topic extraction i Thai language [C] . Chulayuth Asawaroengchai, Warasinee Chaisangmongkon, Djitt Laowattana 2018 5th International Conference on Business and Industrial Research . 2018

机译：泰语主题提取的概率学习模型
5. Probabilistic Topic Modeling and Classification Probabilistic PCA for Text Corpora. [D] . Cheng, Chi Wa. 2011

机译：文本主题的概率主题建模和分类概率PCA。
6. Plant Phenotyping using Probabilistic Topic Models: Uncovering the Hyperspectral Language of Plants [O] . Mirwaes Wahabzada, Anne-Katrin Mahlein, Christian Bauckhage, -1

机译：使用概率主题模型进行植物表型分析：发现植物的高光谱语言
7. Topic Modeling for Keyword Extraction: using Natural Language Processing methods for keyword extraction in Portal Min@s [O] . Arnaldo Candido Junior, Célia Magalhães, Helena Caseli, 2015

机译：关键字提取主题建模：使用Portal Min的关键字提取的自然语言处理方法

Probabilistic learning models for topic extraction i Thai language

摘要

著录项

相似文献

相关主题

期刊订阅