Using dictionary in a knowledge based algorithm for clustering short texts in Bahasa Indonesia

机译：在基于知识的算法中使用字典对印度尼西亚语中的短文本进行聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text clustering is important in many application of information retrieval. This paper presents a study of clustering short texts in Bahasa Indonesia using semantic similarity approach where dictionary of synonyms and hyponyms is used to get information on word relatedness. We compare sentence similarity calculations based on lexical matching and word similarity. More than 250 sentences are involved. Our experiment shows that clustering using sentence similarity based on lexical matching performs better in terms of precision and F-measure than clustering using sentence similarity based on semantic approach.

机译：文本聚类在信息检索的许多应用中很重要。本文提出了一种使用语义相似性方法在印度尼西亚语中将短文本聚类的研究，其中同义词和下义词的词典用于获取单词相关性的信息。我们比较基于词汇匹配和单词相似度的句子相似度计算。涉及超过250个句子。我们的实验表明，与基于语义方法的基于句子相似度的聚类相比，基于词法匹配的基于句子相似度的聚类在精度和F度量方面表现更好。

著录项

来源
《International Conference on Data and Software Engineering》|2014年|1-4|共4页
会议地点
作者
Thamrin Husni; Sabardila Atiqa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Bahasa Indonesia; dictionary; text clustering; word relatedness;

机译：印尼语;字典;文本聚类;单词相关性;

相似文献

外文文献
中文文献
专利

1. Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm [J] . Wu Di, Yang Ruixin, Shen Chao Journal of Intelligent Information Systems . 2021,第1期

机译：情绪字共有和知识对特征提取基于LDA短文本聚类算法
2. Normalization of Abbreviation and Acronym on Microtext in Bahasa Indonesia by Using Dictionary-Based and Longest Common Subsequence (LCS) [J] . Dani Gunawan, Zurwatus Saniyah, Ainul Hizriadi Procedia Computer Science . 2019,第12期

机译：使用基于字典的最长公共子序列（LCS）对印度尼西亚语中的微文本缩写和首字母缩写进行归一化
3. Context-sensitive normalization of social media text in bahasa Indonesia based on neural word embeddings [J] . Renny Pradina Kusumawardani, Stezar Priansya, Faizal Johan Atletiko Procedia Computer Science . 2018,第22期

机译：基于神经词嵌入的印尼巴哈萨语社交媒体文本的上下文相关标准化
4. Using dictionary in a knowledge based algorithm for clustering short texts in Bahasa Indonesia [C] . Thamrin Husni, Sabardila Atiqa International Conference on Data and Software Engineering . 2014

机译：在基于知识的算法中使用字典来聚类Bahasa印度尼西亚的短文本
5. Bahasa Gado-Gado in Indonesian Popular Texts: Expanding Indonesian Identities through Code-Switching with English. [D] . Martin, Nelly. 2017

机译：印度尼西亚语流行语中的Bahasa Gado-Gado：通过使用英语进行代码转换来扩展印度尼西亚身份。
6. Comparing the effect of group- based training along with text messaging and compact disc- based training on men’s knowledge and attitude about participation in perinatal care: a cluster randomized control trial [O] . Vahideh Firouzan, Mahnaz Noroozi, Mojgan Mirghafourvand, 2020

机译：基于集团的培训和基于文本消息的培训和基于小型票据的培训对男士知识和态度的培训进行了比较：围产期护理的态度：一组随机控制试验
7. Research and Application of Short Text Clustering Algorithms Based on Hadoop [O] . 王志沿 2015

机译：基于Hadoop的短文本聚类算法的研究与应用

Using dictionary in a knowledge based algorithm for clustering short texts in Bahasa Indonesia

摘要

著录项

相似文献

相关主题

期刊订阅