Deduplication Method for Ukrainian Last Names, Medicinal Names, and Toponyms Based on Metaphone Phonetic Algorithm

机译：基于间谍语音算法的乌克兰姓氏，药用名称和地名的重复数据删除方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper attempts to optimize the phonetic search processes for fuzzy matching tasks, such as deduplication of data in various databases and registers to reduce the number of errors in personal data entry (for instance, last names). The analysis of the most common last names in the territory of Ukraine shows that the majority of these last names are of Ukrainian and Russian origin (which are also reduced to phonetic rules of the Ukrainian language). The rules for pronouncing and writing last names in Ukrainian are fundamentally different from the basic algorithms for English and quite different for the Russian language, so the phonetic algorithm should take into account the peculiarities of the formation of Ukrainian last names. The use of the phonetic algorithm gives significant advantages in search and deduplication in comparison with already known algorithms: calculation of Levenshtein, Damerau-Levenshtein, Hamming, Jaro or Jaro-Winkler distance, Q-gram index, etc. [1]. The task of searching by last name was previously formalized in English [2, 3], Russian [4, 5] and some other languages, but for the Ukrainian language such an attempt was made for the first time. The paper presents the results of the experiment on the formation of phonetic indices, as well as the results of increasing productivity when using the generated indices. A method of tailoring the search to other domains and several related languages is presented separately, for example, the search for medicines. Also, search optimization by place names in Ukrainian and Russian was separately worked out. Since in Ukraine there is an abrupt change in the names of cities and streets, the latest relevant data was collected to obtain an up-to-date list of names. Among the existing phonetic search algorithms for the Cyrillic language group, the Metaphone has proven itself in the best way.

机译：本文试图优化模糊匹配任务的语音搜索过程，例如各种数据库中数据的重复数据删除，并寄存器来减少个人数据输入中的错误数（例如，姓氏）。对乌克兰境内最常见的姓氏的分析表明，这些姓氏的大多数是乌克兰和俄罗斯起源（也减少到乌克兰语言的语音规则）。发音和写入乌克兰姓氏的规则从根本上与英语的基本算法与俄语的基本算法不同，因此语音算法应考虑到乌克兰姓氏的形成。与已经已知的算法相比，使用语音算法的使用具有显着的搜索和重复数据删除的优势：Levenshtein，Damerau-Levenshtein，汉明，Jaro或Jaro-Winkler距离，Q-Gram指数等的计算。[1]。搜索姓氏的任务以前在英语[2,3]，俄语[4,5]和一些其他语言中正式化，但对于乌克兰语言，第一次尝试进行这种尝试。本文提出了对语音指数的形成的实验结果，以及使用所生成的指数时提高生产率的结果。例如，举报了对其他域的搜索和几种相关语言的方法，例如，搜索药物。此外，在乌克兰和俄罗斯的名称搜索优化单独制定。由于在乌克兰突然发生了城市和街道的名称，收集了最新的相关数据，以获取最新的名称列表。在用于西里尔语语言组的现有语音搜索算法中，Metaphone以最好的方式证明了自己。

著录项

来源
《International Conference on Computer Science, Engineering and Education Applications》|2021年|x 683 pages :|共16页
会议地点
作者
Zhengbing Hu; V. Buriachok; V. Sokolov;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 73.9083;
关键词
Deduplication; Fuzzy coincidence; Phonetic rule; Phonetic algorithm; Ukrainian last name; Ukrainian surname; International nonproprietary name; Medicine; Medication; Drug; Toponym; Metaphone;

机译：重复数据删除;模糊巧合;语音规则;语音算法;乌克兰姓氏;乌克兰姓氏;国际非专业名称;药物;药物;药物;双名;形代;

相似文献

外文文献
中文文献
专利

1. Novel Phonetic Name Matching Algorithm with a Statistical Ontology for Analysing Names Given in Accordance with Thai Astrology [J] . Chakkrit Snae, Michael Brueckner Journal of issues in informing science & information technology . 2009,第pta2期

机译：具有统计本体论的新颖语音名称匹配算法，用于分析根据泰国占星术给出的名称
2. Novel Phonetic Name Matching Algorithm with a Statistical Ontology for Analysing Names Given in Accordance with Thai Astrology [J] . Chakkrit Snae, Michael Brückner Issues in Informing Science and Information Technology . 2009,第4期

机译：具有统计本体论的新颖语音名称匹配算法，用于分析根据泰国占星术给出的名称
3. Evaluation of a Culture-Dependent Algorithm and a Molecular Algorithm for Identification of Shigella spp., Escherichia coli, and Enteroinvasive E. coli [J] . Maaike J. C. van den Beld, Richard F. de Boer, Frans A. G. Reubsaet, Journal of Clinical Microbiology . 2018,第10期

机译：评估用于识别志贺氏菌 spp。，大肠杆菌和肠侵袭性 E。大肠杆菌
4. Deduplication Method for Ukrainian Last Names, Medicinal Names, and Toponyms Based on Metaphone Phonetic Algorithm [C] . Zhengbing Hu, V. Buriachok, V. Sokolov International Conference on Computer Science, Engineering and Education Applications . 2021

机译：基于间谍语音算法的乌克兰姓氏，药用名称和地名的重复数据删除方法
5. Russian and Ukrainian adjectives referring to place-names: A contrastive analysis. [D] . Phillips, Olena. 2010

机译：俄语和乌克兰语形容词所指的地名：对比分析。
6. Combining string and phonetic similarity matching to identify misspelt names of drugs in medical records written in Portuguese [O] . Hegler Tissot, Richard Dobson 2019

机译：结合字符串和语音相似性匹配以识别葡萄牙语书写的医疗记录中药物的拼写错误名称
7. The geography of toponyms derived from animal names in one Eastern Ukrainian region [O] . Аleksandr Shaposhnikov 2020

机译：在乌克兰东部地区源自动物名称的地理位置

Deduplication Method for Ukrainian Last Names, Medicinal Names, and Toponyms Based on Metaphone Phonetic Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅