Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method

Viny Christanti Mawardi; Niko Susanto; Dali Santun Naga

首页> 外文期刊>MATEC Web of Conferences >Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method

【24h】

Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method

机译：使用有限状态自动机和Levenshtein距离方法对印度尼西亚语中的文本文档进行拼写校正

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Any mistake in writing of a document will cause the information to be told falsely. These days, most of the document is written with a computer. For that reason, spelling correction is needed to solve any writing mistakes. This design process discuss about the making of spelling correction for document text in Indonesian language with document's text as its input and a .txt file as its output. For the realization, 5 000 news articles have been used as training data. Methods used includes Finite State Automata (FSA), Levenshtein distance, and N-gram. The results of this designing process are shown by perplexity evaluation, correction hit rate and false positive rate. Perplexity with the smallest value is a unigram with value 1.14. On the other hand, the highest percentage of correction hit rate is bigram and trigram with value 71.20 %, but bigram is superior in processing time average which is 01:21.23 min. The false positive rate of unigram, bigram, and trigram has the same percentage which is 4.15 %. Due to the disadvantages at using FSA method, modification is done and produce bigram's correction hit rate as high as 85.44 %.

机译：书面文档中的任何错误都将导致错误地告知信息。如今，大多数文档都是用计算机编写的。因此，需要纠正拼写以解决任何书写错误。此设计过程讨论了如何使用印度尼西亚文本作为输入，并使用.txt文件作为输出，以印度尼西亚语对文档文本进行拼写校正的问题。为了实现这一目标，已将5 000条新闻文章用作培训数据。使用的方法包括有限状态自动机（FSA），Levenshtein距离和N-gram。通过困惑度评估，校正命中率和误报率来显示此设计过程的结果。值为最小的困惑是值为1.14的字母组合。另一方面，校正命中率的最高百分比是双字组和三字组，值为71.20％，但是双字组在处理时间平均值（01：21.23分钟）方面更胜一筹。 unigram，bigram和trigram的误报率具有相同的百分比，为4.15％。由于使用FSA方法的缺点，因此进行了修改，从而使bigram的校正命中率高达85.44％。

著录项

来源
《MATEC Web of Conferences》 |2018年第3期|共16页
作者
Viny Christanti Mawardi; Niko Susanto; Dali Santun Naga;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类一般工业技术;
关键词
入库时间 2022-08-18 18:48:17

相似文献

外文文献
中文文献
专利

1. Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith [J] . muhammad zaky ramadhan, Kemas Muslim Lhaksmana Jurnal RESTI: Rekayasa Sistem dan Teknologi Informasi . 2020,第3期

机译：用弱和制造印度尼西亚翻译的拼写修正改进文档检索
2. An Evaluation of retrieval Effectiveness Using Spelling- Correction and String-Similarity Matching Methods on Malay Texts [J] . Zainab Abu Bakar, Tengku Mohd T. Sembok, Mohammed Yusoff Journal of the American Society for Information Science . 2000,第8期

机译：使用拼写校正和字符串相似性匹配方法评估马来文本的检索效果
3. IDENTIFYING TEXT DOCUMENT PATTERN FOR TWO TERMS APPEARANCES VIA LATENT SEMANTIC ANALYSIS (LSA) METHOD AND TERM DISTANCE BETWEEN TWO DOCUMENTS [J] . SOEHARDJOEPRI, NUR IRIAWAN, BRODJOL SUTIJO SU, Journal of Theoretical and Applied Information Technology . 2015,第2期

机译：通过潜在语义分析（LSA）方法和两个文档之间的术语距离来识别两种术语的文本文档模式
4. Application of document spelling checker for Bahasa Indonesia [C] . Aqsath Rasyid N., Kamayani Mia, Reinanda Ridho, 2011 International Conference on Advanced Computer Science and Information Systems . 2011

机译：文档拼写检查器在印尼语中的应用
5. Bahasa Gado-Gado in Indonesian Popular Texts: Expanding Indonesian Identities through Code-Switching with English. [D] . Martin, Nelly. 2017

机译：印度尼西亚语流行语中的Bahasa Gado-Gado：通过使用英语进行代码转换来扩展印度尼西亚身份。
6. Dynamic and Quantitative Method of Analyzing Service Consistency Evolution Based on Extended Hierarchical Finite State Automata [O] . Linjun Fan, Jun Tang, Yunxiang Ling, -1

机译：基于扩展层次有限状态自动机的服务一致性演化动态定量分析方法
7. Pengembangan Aplikasi Text-to-Speech Bahasa Indonesia Menggunakan Metode Finite State Automata Berbasis Android [O] . Rieke Adriati W., Herman Tolle, Onny Setyawati 2016

机译：使用基于Android的有限状态自动机方法的印度尼西亚文本到语音应用程序的开发

Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method

摘要

著录项

相似文献

相关主题

期刊订阅