Using Sequence Similarity Networks to Identify Partial Cognates in Multilingual Wordlists

机译：使用序列相似性网络识别多语言单词列表中的部分认知

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Increasing amounts of digital data in historical linguistics necessitate the development of automatic methods for the detection of cognate words across languages. Recently developed methods work well on language families with moderate time depths, but they are not capable of identifying cognate morphemes in words which are only partially related. Partial cog-nacy, however, is a frequently recurring phenomenon, especially in language families with productive derivational morphology. This paper presents a pilot approach for partial cognate detection in which networks are used to represent similarities between word parts and cognate morphemes are identified with help of state-of-the-art algorithms for network partitioning. The approach is tested on a newly created benchmark dataset with data from three sub-branches of Sino-Tibetan and yields very promising results, outperforming all algorithms which are not sensible to partial cognacy.

机译：历史语言学中越来越多的数字数据需要开发用于跨语言检测同源词的自动方法。最近开发的方法在具有中等时间深度的语言家庭上工作，但它们不能识别仅部分相关的词语的同位形态。然而，部分齿轮NACY是一种经常经常性的现象，特别是在具有生产衍生形态的语言系列中。本文介绍了用于部分同源检测的导频方法，其中用于代表单词零件和同源语素之间的相似性，以帮助网络分区的最先进的算法来识别。该方法在新创建的基准数据集上进行测试，其中包含来自Sino-intibetan的三个子分支的数据，并产生非常有前途的结果，优于所有对部分认知不明智的算法。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2016年|599-605|共7页
会议地点
作者
Johann-Mattis List; Philippe Lopez; Eric Bapteste;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Vitellogenin, a biomarker for environmental estrogenic pollution, of Reeves' pond turtles: analysis of similarity for its amino acid sequence and cognate mRNA expression after exposure to estrogen. [J] . Tada N., Nakao A., Hoshi H., The Journal of Veterinary Medical Science . 2008,第3期

机译：里夫斯池塘乌龟的卵黄蛋白原是环境雌激素污染的生物标志物：暴露于雌激素后其氨基酸序列和同源mRNA表达的相似性分析。
2. Vitellogenin, a Biomarker for Environmental Estrogenic Pollution, of Reeves' Pond Turtles: Analysis of Similarity for its Amino Acid Sequence and Cognate mRNA Expression after Exposure to Estrogen [J] . Aya NAKAO, Hidenobu HOSHI, Masahiro SAKA, The Journal of Veterinary Medical Science . 2008,第3期

机译：Vitellogenin，一种用于环境培养污染的生物标志物，Reeves'池塘海龟：在暴露于雌激素后其氨基酸序列和同源mRNA表达的相似性分析
3. Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks [J] . Qicheng Ma, Gung-Wei Chirn, Richard Cai, BMC Bioinformatics . 2005,第1期

机译：用从序列相似性得分和神经网络序列比对转换而来的新指标将蛋白质序列聚类
4. Using Sequence Similarity Networks to Identify Partial Cognates in Multilingual Wordlists [C] . Johann-Mattis List, Philippe Lopez, Eric Bapteste Annual meeting of the Association for Computational Linguistics . 2016

机译：使用序列相似网络识别多语种字列表中的部分同源
5. The design and development of the Cognate Site Identifier (CSI) microarray to determine the comprehensive sequence specificity of any DNA-binding molecule. [D] . Warren, Christopher L. 2008

机译：同源位点识别器（CSI）微阵列的设计和开发，可确定任何DNA结合分子的全面序列特异性。
6. Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks [O] . Qicheng Ma, Gung-Wei Chirn, Richard Cai, 2005

机译：用从序列相似性得分和神经网络序列比对转换而来的新指标将蛋白质序列聚类
7. Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists [O] . Jäger, G., List, J., Safroniev, P. 2017

机译：使用支持向量机和最先进的语音对齐算法来识别多语言单词列表中的同源词

Using Sequence Similarity Networks to Identify Partial Cognates in Multilingual Wordlists

摘要

著录项

相似文献

相关主题

期刊订阅