首页> 外文会议>International Conference on Language Resources and Evaluation >Dialect Clustering with Character-Based Metrics: in search of the boundary of language and dialect
【24h】

Dialect Clustering with Character-Based Metrics: in search of the boundary of language and dialect

机译:与基于角色的指标的方言集群:寻找语言和方言的边界

获取原文

摘要

We present in this work a universal, character-based method for representing sentences so that one can thereby calculate the distance between any two sentence pair. With a small alphabet, it can function as a proxy of phonemes, and as one of its main uses, we carry out dialect clustering: cluster a dialect/sub-language mixed corpus into sub-groups and see if they coincide with the conventional boundaries of dialects and sub-languages. By using data with multiple Japanese dialects and multiple Slavic languages, we report how well each group clusters, in a manner to partially respond to the question of what separates languages from dialects.
机译:我们在这项工作中展示了一种基于句子的普遍,字符的方法,从而可以计算任何两个句子对之间的距离。使用小字母表,它可以作为音素的代理,作为其主要用途之一,我们执行方言集群:将语言/子语言混合语料库群集成小组,看看它们是否与传统边界一致方言和子语言。通过使用具有多个日语方言和多种斯拉夫语言的数据,我们报告每个组集群的方式有何回应与从方言分离语言的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号