首页> 外文期刊>Computer speech and language >Applications of graph theory to an English rhyming corpus
【24h】

Applications of graph theory to an English rhyming corpus

机译:图论在英语押韵语料库中的应用

获取原文
获取原文并翻译 | 示例

摘要

How much can we infer about the pronunciation of a language - past or present - by observing which words its speakers rhyme? This paper explores the connection between pronunciation and network structure in sets of rhymes. We consider the rhyme graphs corresponding to rhyming corpora, where nodes are words and edges are observed rhymes. We describe the graph G corresponding to a corpus of ~ 12000 rhymes from English poetry written c. 1900, and find a close correspondence between graph structure and pronunciation: most connected components show community structure that reflects the distinction between full and half rhymes. We build classifiers for predicting which components correspond to full rhymes, using a set of spectral and non-spectral features. Feature selection gives a small number (1-5) of spectral features, with accuracy and F-measure of ~90%, reflecting that positive components are essentially those without any good partition. We partition components of G via maximum modularity, giving a new graph, G', in which the "quality" of components, by several measures, is much higher than in G. We discuss how rhyme graphs could be used for historical pronunciation reconstruction.
机译:通过观察说话者的押韵,我们能推断出一种语言的过去或现在的发音多少?本文探讨了韵母中发音与网络结构之间的联系。我们考虑与押韵语料库相对应的押韵图,其中结点是单词,边缘是押韵。我们描述了图G,它对应于英语诗歌c的〜12000个韵母。 1900年,发现图结构和发音之间有密切的对应关系:大多数连接的组件都显示出反映全韵和半韵的区别的社区结构。我们使用一组频谱和非频谱特征来构建分类器,以预测哪些成分对应于完整的押韵。特征选择给出了少量(1-5)的光谱特征,其准确度和F测度约为90%,反映出正分量本质上是那些没有良好划分的分量。我们通过最大的模块化对G的各个分量进行划分,从而给出一个新的图形G',其中,通过多种度量,这些分量的“质量”比G中的高得多。我们讨论了如何将韵律图用于历史发音重建。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号