Indonesian graphemic syllabification using a nearest neighbour classifier and recovery procedure

Edwina Anky Parande; Suyanto Suyanto

首页> 外文期刊>International journal of speech technology >Indonesian graphemic syllabification using a nearest neighbour classifier and recovery procedure

【24h】

Indonesian graphemic syllabification using a nearest neighbour classifier and recovery procedure

机译：使用最近邻分类器和恢复程序进行印度尼西亚字素音节化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An automatic syllabification, decomposing a word into syllables, is an important part in an automatic speech recognition (ASR) that uses both syllable-based acoustic and language models. It can be performed to either phoneme or grapheme sequences. The phonemic syllabification is more complex than the other since it requires a grapheme-to-phoneme conversion (G2P) as a previous process. It generally gives a high accuracy for many formal words but its accuracy may decrease for person-names. In contrast, the graphemic syllabification is simpler and more potential to be applied for person-names. This research focuses on developing a model of graphemic syllabification using a combination of phonotactic rules and Fuzzy k-nearest neighbour in every Class (FkNNC). The phonotactic rules are designed to find some deterministic syllabification points while FkNNC, as a statistical classifier, is expected to search the remaining stochastic syllabification points. A recovery procedure is proposed to correct the wrong syllabification points produced by FkNNC. Fivefold cross-validating on a dataset of 50k formal words, selected from the great dictionary of the Indonesian language, shows that the proposed model gives syllable error rate (SER) of 2.48% and the proposed recovery procedure reduces the SER to be 2.27%, which is higher than that produced by the phonemic syllabification (only 0.99%). But, this model is capable of handling a dataset of 15k high variance person-names with SER of 7.45% and the proposed recovery procedure reduces the SER to be 6.78%.

机译：将单词分解为音节的自动音节化是使用基于音节的声学和语言模型的自动语音识别（ASR）的重要组成部分。它可以对音素序列或字素序列执行。音素音节化比其他音素音节化更为复杂，因为它需要一个音素到音素转换（G2P）作为先前的过程。通常，对于许多正式词来说，它具有很高的准确性，但对于人名，其准确性可能会降低。相反，字形音节化更简单，并且更有可能应用于人名。这项研究的重点是使用音素规则和每个类别中的模糊k最近邻（FkNNC）的组合来发展音素音节化模型。音位规则旨在查找一些确定性的音节化点，而FkNNC作为统计分类器，则有望搜索剩余的随机音节化点。提出了一种恢复程序来纠正FkNNC产生的错误音节化点。从印度尼西亚语大词典中选取的50k个正式单词的数据集进行五重交叉验证，结果表明，该模型的音节错误率（SER）为2.48％，并且所建议的恢复程序将SER降低为2.27％，这高于音素音节化产生的音调（仅0.99％）。但是，此模型能够处理SER为7.45％的15k高方差人名数据集，并且所提出的恢复程序将SER降低为6.78％。

著录项

来源
《International journal of speech technology》 |2019年第1期|13-20|共8页
作者
Edwina Anky Parande; Suyanto Suyanto;
展开▼
作者单位

School of Computing, Telkom University, Bandung, West Java 40257, Indonesia;

School of Computing, Telkom University, Bandung, West Java 40257, Indonesia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bahasa Indonesia; Fuzzy; Graphemic syllabification; Nearest neighbour; Phonotactic rules; Speech recognition;

机译：印尼语;模糊;字音节化;最近的邻居;音律规则;语音识别;

相似文献

外文文献
中文文献
专利

1. Detecting Contextual Faults in Unmanned Aerial Vehicles Using Dynamic Linear Regression and K-Nearest Neighbour Classifier [J] . A. Alos, Z. Dahrouj Gyroscopy and navigation . 2020,第1期

机译：使用动态线性回归和k最近邻分类检测无人驾驶飞行器中的上下文故障
2. Turbine blade tip clearance determination using microwave measurement and k-nearest neighbour classifier [J] . Aslinezhad Mehdi, Hejazi Maryam A. Measurement . 2020,第期

机译：使用微波测量和K最近邻分类器的涡轮叶片尖端清除测定
3. Fuzzy detection orange tree leaves diseases using a co-occurrence matrix-based K-nearest neighbours classifiers [J] . Fatimazahra Jakjoud, Anas Hatim, Abella Bouaddi International journal of knowledge engineering and soft data paradigms . 2019,第3a4期

机译：模糊检测橙树使用共同发生的基于矩阵的K-Collect邻居分类器留下疾病
4. Indonesian Graphemic Syllabification Using n-Gram Tagger with State-Elimination [C] . Rezza Nafi Ismail, Suyanto Suyanto International Conference on Information and Communication Technology . 2020

机译：使用带有状态消除功能的n-Gram Tagger进行印度尼西亚音素化
5. Advances in Machine Learning: Nearest Neighbour Search, Learning to Optimize and Generative Modelling [D] . Li, Ke. 2019

机译：机器学习进步：最近的邻居搜索，学习优化和生成造型
6. Evolutionary sequential genetic search technique-based cancer classification using fuzzy rough nearest neighbour classifier [O] . Loganathan Meenachi, Srinivasan Ramakrishnan 2018

机译：使用进化粗糙最近邻分类器的基于进化顺序遗传搜索技术的癌症分类
7. IoT-cloud based healthcare model for COVID-19 detection: an enhanced k-Nearest Neighbour classifier based approach [O] . Rajendrani Mukherjee, Aurghyadip Kundu, Indrajit Mukherjee, 2021

机译：基于IOT-Cloud的Covid-19检测医疗模型：基于增强的基于K-Cirist Exbank分类器的方法
8. Natural Language Text Classification and Filtering with Trigrams and Evolutionary Nearest Neighbour Classifiers. Software Engineering (SEN). [R] . Langdon, W. B. 2000

机译：基于Trigrams和进化最近邻分类器的自然语言文本分类和过滤。软件工程（sEN）。

Indonesian graphemic syllabification using a nearest neighbour classifier and recovery procedure

摘要

著录项

相似文献

相关主题

期刊订阅