News Article Text Classification in Indonesian Language

Rini Wongso; Ferdinand Ariandy Luwinda; Brandon Christian Trisnajaya; Olivia Rusli; Rudy

首页> 外文期刊>Procedia Computer Science >News Article Text Classification in Indonesian Language

【24h】

News Article Text Classification in Indonesian Language

机译：新闻文章印度尼西亚语中的文本分类

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This research intends to find the appropriate algorithm to automatically classify a news article in Indonesian Language. We obtain our dataset which is taken by using a web crawling method from www.cnnindonesia.com. First of all, the document will first undergo some Text Preprocessing method in the form of Lemmatization and Stopwords Removal. The reason we are doing the Text Preprocessing step before anything else is to minimize the noise in the document. Next, we apply Feature Selection onto the document to further separate important words and less important words inside the document. After applying Feature Selection, the document will be classified by the classifier. We are comparing the TF-IDF and SVD algorithm for feature selection, while also comparing the Multinomial Na?ve Bayes, Multivariate Bernoulli Na?ve Bayes, and Support Vector Machine for the Classifiers. Based on the test results, the combination of TF-IDF and Multinomial Na?ve Bayes Classifier gives the highest result compared to the other algorithms, which precision is 0.9841519 and its recall is 0.9840000. The result outperform the previous similar study that classify news article in Indonesian language which obtained 85% of accuracy.

机译：本研究打算找到适当的算法，以在印度尼西亚语言中自动分类新闻文章。我们获取我们的数据集，通过使用www.cnnindonesia.com使用Web爬网方法进行的。首先，该文档将以lemmatization和stopwords删除的形式接受一些文本预处理方法。我们在其他任何内容之前正在进行文本预处理步骤的原因是最小化文档中的噪声。接下来，我们将功能选择应用到文档上，以在文档中进一步分开重要的单词和不太重要的单词。应用功能选择后，文档将由分类器分类。我们正在比较特征选择的TF-IDF和SVD算法，同时还比较多项式Na ve Bayes，多元伯努利Na'Ve贝叶斯，以及支持分类器的支持向量机。基于测试结果，与其他算法相比，TF-IDF和多项式Na ve + Ve贝叶斯分类器的组合给出了最高结果，其精度为0.9841519，其召回是0.9840000。结果优先于前面的类似研究，将新闻文章以印度尼西亚语言分类，该文章获得了85％的准确性。

著录项

来源
《Procedia Computer Science 》 |2017年第2017期| 共7页
作者
Rini Wongso; Ferdinand Ariandy Luwinda; Brandon Christian Trisnajaya; Olivia Rusli; Rudy;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Ontology-Based Automatic Classification for News Articles in Indonesian Language [J] . Prajna Basnur, Dana Sensuse Makara Seri Teknologi . 2010 ,第1期

机译：基于本体的印尼语新闻文章自动分类
2. Automatic Text Classification of English Newswire Articles Based on Statistical Classification Techniques [J] . GUOWEI ZU, WATARU OHYAMA, TETSUSHI WAKABAYASHI, Electrical engineering in Japan . 2005 ,第1期

机译：基于统计分类技术的英语新闻专栏文章自动文本分类
3. Building semantically annotated corpus for text classification of Indian defence news articles [J] . aurabh A. Kanekar, Alind Sharma, Gaurang S. Patkar, International Journal of Information Technology . 2021 ,第4期

机译：建立语义注释的印度国防新闻文本分类语料库
4. A Study of Text Classification for Indonesian News Article [C] . Grelly Lucia Yovellia Londo, Dwiky Hutomo Kartawijaya, Hesti Tri Ivariyani, International Conference of Artificial Intelligence and Information Technology . 2019

机译：印尼新闻文章的文本分类研究
5. Classification and Prediction of Newspaper Articles on the Basis of Author Gender [D] . Singh, Devisha. 2018

机译：作者性别的报纸文章分类与预测
6. ScienceCentral: open access full-text archive of scientific journals based on Journal Article Tag Suite regardless of their languages [O] . Sun Huh 2013

机译：ScienceCentral：基于期刊文章标签套件的科学期刊的开放获取全文存档无论其语言是什么
7. Ontology-Based Automatic Classification for News Articles in Indonesian Language [O] . Prajna Basnur, Dana Sensuse 2010

机译：基于本体的印尼语新闻文章自动分类
8. Security Classification Using Automated Learning (SCALE): Optimizing Statistical Natural Language Processing Techniques to Assign Security Labels to Unstructured Text [R] . Brown, J. D., Charlebois, D. 2010

机译：使用自动学习的安全性分类（sCaLE）：优化统计自然语言处理技术，将安全标签分配给非结构化文本

News Article Text Classification in Indonesian Language

摘要

著录项

相似文献

相关主题

期刊订阅