Applying Clustering and Topic Modeling to Automatic Analysis of Citizens’ Comments in E-Government

Gunay Y. Iskandarli

摘要

The paper proposes an approach to analyze citizens' comments in e-government using topic modeling and clustering algorithms. The main purpose of the proposed approach is to determine what topics are the citizens' commentaries about written in the e-government environment and to improve the quality of e-services. One of the methods used to determine this is topic modeling methods. In the proposed approach, first citizens' comments are clustered and then the topics are extracted from each cluster. Thus, we can determine which topics are discussed by citizens. However, in the usage of clustering and topic modeling methods appear some problems. These problems include the size of the vectors and the collection of semantically related of documents in different clusters. Considering this, the semantic similarity of words is used in the approach to reduce measure. Therefore, we only save one of the words that are semantically similar to each other and throw the others away. So, the size of the vector is reduced. Then the documents are clustered and topics are extracted from each cluster. The proposed method can significantly reduce the size of a large set of documents, save time spent on the analysis of this data, and improve the quality of clustering and LDA algorithm.

机译：本文提出了一种利用主题建模和聚类算法分析公民在电子政务方面的评论。拟议方法的主要目的是确定有关在电子政务环境中书面的公民评论以及提高电子服务质量的评论的主题。用于确定这的方法之一是主题建模方法。在提出的方法中，第一个公民的注释是群集的，然后从每个群集中提取主题。因此，我们可以确定公民讨论哪些主题。但是，在使用聚类和主题建模方法时出现一些问题。这些问题包括向量的大小以及与不同簇中的文档的语义相关的集合。考虑到这一点，在减少测量的方法中使用单词的语义相似性。因此，我们只保存一个语义上彼此类似的单词并将其他人扔掉。因此，减少了向量的尺寸。然后，文档是群集的，并且从每个群集中提取主题。所提出的方法可以显着降低大量文档的大小，节省了对该数据分析的时间，提高了聚类和LDA算法的质量。

Applying Clustering and Topic Modeling to Automatic Analysis of Citizens’ Comments in E-Government

摘要

著录项

相关主题

期刊订阅