Phoneme inventory, trigrams and geographic location as features for clustering different philippine languages

机译：音素清单，三字组和地理位置是聚类不同菲律宾语言的功能

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, orthographic, geographic and phonetic features were explored to cluster 32 Philippine languages and identify closely-related languages. For the orthographic data, we collected religious text documents online and 100,000 words per language were used as training data. These words were cleaned and trigram profiles were generated. For the geographic feature, we used the location where the language is spoken. For the phonetic feature, the phoneme inventory of the languages was utilized. The languages were clustered using two clustering algorithms, hierarchical and k-means algorithm. Purity was used as an evaluation metric to validate the clusters made. For both hierarchical clustering and k-means algorithm, the highest purity value of a cluster is 0.67, this is an indication that members in a particular cluster have similar attributes. As future work, semantic features can be added to improve the data set and additional languages can be considered.

机译：在本文中，对正交，地理和语音特征进行了探索，以聚类32种菲律宾语言并识别紧密相关的语言。对于正字数据，我们在线收集了宗教文本文档，每种语言的100,000个单词被用作训练数据。这些单词被清除，并生成了trigram配置文件。对于地理特征，我们使用了使用该语言的位置。对于语音功能，使用了语言的音素清单。语言使用两种聚类算法（层次和k-means算法）进行聚类。纯度用作评估指标以验证制成的簇。对于分层聚类和k-均值算法，一个聚类的最高纯度值为0.67，这表明特定聚类中的成员具有相似的属性。在将来的工作中，可以添加语义功能以改善数据集，并可以考虑其他语言。

著录项

来源
《2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Technique》|2016年|137-140|共4页
会议地点 Bali(ID)
作者
Angelica Dela Cruz; Nathaniel Oco; Leif Romeritch Syliongka; Rachel Edita Roxas;
展开▼
作者单位

College of Computer Studies, National University, Philippines;

College of Computer Studies, National University, Philippines;

College of Computer Studies, National University, Philippines;

College of Computer Studies, National University, Philippines;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering algorithms; Measurement; Standardization; Speech; Databases; Phylogeny; Algorithm design and analysis;

机译：聚类算法;测量;标准化;语音;数据库;系统发育;算法设计与分析;
入库时间 2022-08-26 14:30:24

相似文献

外文文献
中文文献
专利

1. Phoneme Set Design Based on Integrated Acoustic and Linguistic Features for Second Language Speech Recognition [J] . Xiaoyun WANG, Tsuneo KATO, Seiichi YAMAMOTO IEICE transactions on information and systems . 2017,第4期

机译：基于语音和语言特性的音素集设计用于第二语言语音识别
2. Particular Features of the Realization of the Phoneme /j/ in the Position after a Consonant before a Vowel in the Russian Literary Language [J] . Leonid Leonidovi Kasatkin, Rozalija Francevna Kasatkina Russian Linguistics . 2004,第2期

机译：俄语文学语言中元音前辅音后位置的音素/ j /实现的特殊特征
3. Location, location, location: geographic clustering of lower-extremity amputation among medicare beneficiaries with diabetes. [J] . Margolis DJ, Hoffstad O, Nafash J, Diabetes care . 2011,第11期

机译：位置，位置，位置：糖尿病医疗受益者下肢截肢的地理聚类。
4. Phoneme inventory, trigrams and geographic location as features for clustering different philippine languages [C] . Angelica Dela Cruz, Nathaniel Oco, Leif Romeritch Syliongka, Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Technique . 2016

机译：音素库存，三重奏和地理位置作为聚类不同菲律宾语言的功能
5. Automatic language identification with sequences of language-independent phoneme clusters. [D] . Berkling, Kay Margarethe. 1996

机译：使用与语言无关的音素簇的序列进行自动语言识别。
6. Location Location Location: Geographic Clustering of Lower-Extremity Amputation Among Medicare Beneficiaries With Diabetes [O] . David J. Margolis, Ole Hoffstad, Jeffrey Nafash, 2011

机译：位置位置位置：患有糖尿病的Medicare受益人下肢截肢的地理聚类
7. The acoustic diversity in the phoneme inventories of the world’s languages [O] . Magdalena Igras, Stanisław Kacprzak, Mariusz Mąsior, 2015

机译：世界语言的音素清单中的声学多样性

Phoneme inventory, trigrams and geographic location as features for clustering different philippine languages

摘要

著录项

相似文献

相关主题

期刊订阅