Character-Based Machine Learning vs. Language Modeling for Diacritics Restoration

Kapo?iūt?-Dzikien? Jurgita; Davidsonas Andrius; Vidugirien? Au?ra

首页> 外文期刊>Engineering Economics >Character-Based Machine Learning vs. Language Modeling for Diacritics Restoration

【24h】

Character-Based Machine Learning vs. Language Modeling for Diacritics Restoration

机译：基于字符的机器学习与语言模型的变音符号还原

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this research we compare two approaches, in particular, character-based machine learning and language-modeling and offer the best solution for the diacritization problem solving. Parameters of tested approaches (i.e., a huge variety of feature types for the character-based method and a value n for the n-gram language-modeling method) were tuned to achieve the highest possible accuracy. Despite the main focus is on the Lithuanian language, we posit that obtained findings can also be applied to other, similar (Latvian or Slavic) languages. During experiments we measured the performance of approaches on 10 domains (including normative texts and non-normative Internet comments). The best results reaching ~99.5% and ~98.4% of the accuracy on characters and words, respectively, were achieved with the tri-gram language modeling method. It outperformed the character-based machine learning approach with an optimal composed feature set by ~1.4% and ~3.8% of the accuracy on characters and words, respectively.DOI: http://dx.doi.org/10.5755/j01.itc.46.4.18066.

机译：在这项研究中，我们比较了两种方法，特别是基于字符的机器学习和语言建模，并为解决双歧化问题提供了最佳解决方案。调整了测试方法的参数（即，基于字符的方法的特征类型种类繁多，针对n-gram语言建模方法的值n）进行了调整，以实现最高的准确性。尽管主要关注立陶宛语，但我们认为获得的发现也可以应用于其他类似（拉脱维亚语或斯拉夫语）语言。在实验过程中，我们测量了10个领域（包括规范文本和非规范Internet注释）方法的性能。使用三元语法语言建模方法时，分别达到了大约99.5％和〜98.4％的字符和单词精度的最佳结果。它在基于字符的机器学习方法上具有最佳的组合特征集，其特征集分别达到了字符和单词的〜1.4％和〜3.8％的精度.DOI：http://dx.doi.org/10.5755/j01.itc .46.4.18066。

著录项

来源
《Engineering Economics 》 |2017年第4期| 共13页
作者
Kapo?iūt?-Dzikien? Jurgita; Davidsonas Andrius; Vidugirien? Au?ra;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类工业经济 ;
关键词

相似文献

外文文献
中文文献
专利

1. Mapping Landslides on EO Data: Performance of Deep Learning Models vs. Traditional Machine Learning Models [J] . International journal of applied mechanics . 2020 ,第3期

机译：在EO数据上映射山体滑坡：深度学习模型的性能与传统机器学习模型
2. Machine learning vs. hybrid machine learning model for optimal operation of a chiller [J] . Park Sungho, Ahn Ki Uhn, Hwang Seungho, Science and Technology for the Built Environment . 2019 ,第1a5期

机译：机器学习与混合机学习模型，用于冷冻机的最佳运行
3. Theodor Bucher Lecture. Metabolomics, modelling and machine learning in systems biology - towards an understanding of the languages of cells. Delivered on 3 July 2005 at the 30th FEBS Congress and the 9th IUBMB conference in Budapest [J] . Kell DB The FEBS journal . 2006 ,第5期

机译：Theodor Bucher演讲。系统生物学中的代谢组学，建模和机器学习-旨在了解细胞的语言。 2005年7月3日在第30届FEBS大会和第9届IUBMB会议在布达佩斯交付
4. Diacritics Restoration in Vietnamese: Letter Based vs. Syllable Based Model [C] . Kiem-Hieu Nguyen, Cheol-Young Ock PRICAI 2010: Trends in artificial intelligence . 2010

机译：越南语的变音符还原：基于字母的基于音节的模型
5. Physiologically Based Toxicokinetic Modeling of Manganese in Rat and Monkey and Machine Learning Classification of Belief vs. Disbelief fMRI Signals [D] . Douglas, Pamela K. 2010

机译：大鼠和猴子中锰的基于生理毒性的动力学模型以及信念与难以置信的功能磁共振成像信号的机器学习分类
6. Modeling the trend of coronavirus disease 2019 and restoration of operational capability of metropolitan medical service in China: a machine learning and mathematical model-based analysis [O] . Zeye Liu, Shuai Huang, Wenlong Lu, 2020

机译：建模2019年冠状病毒疾病趋势和恢复中国大城市医疗服务的运营能力：基于机器学习和数学模型的分析
7. Letter Level Learning for Language Independent Diacritics Restoration [O] . Rada Mihalcea, Vivi Nastase 2002

机译：字母水平学习以恢复语言独立变音符号

Character-Based Machine Learning vs. Language Modeling for Diacritics Restoration

摘要

著录项

相似文献

相关主题

期刊订阅