A Fast Algorithm for Local Rank Distance: Application to Arabic Native Language Identification

机译：本地等级距离的快速算法：在阿拉伯语母语识别中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A novel distance measure for strings, termed Local Rank Distance (LRD), was recently introduced. LRD is inspired from rank distance, but it is designed to conform to more general principles, while being more adapted for specific data types, such as DNA strings or text. More precisely, LRD measures the local displacement of character n-grams among two strings. Local Rank Distance has already demonstrated promising results in computational biology and native language identification, but the algorithm used to compute LRD is computationally expensive. In this paper, an efficient algorithm for LRD is proposed. The main efficiency improvement is to build a positional inverted index for the character n-grams in one of the compared strings. Then, for each n-gram in the other string, a binary search is used to find the position of the nearest matching n-gram in the positional inverted index. The proposed algorithm is more than two orders of magnitude faster than the original algorithm. An application of the described algorithm is also exhibited in this paper. Indeed, state of the art results are presented for Arabic native language identification from text documents.

机译：最近引入了一种新颖的字符串距离度量，称为本地等级距离（LRD）。 LRD受等级距离的启发，但其设计旨在遵循更一般的原则，同时更适用于特定的数据类型，例如DNA字符串或文本。更准确地说，LRD测量两个字符串之间字符n-gram的局部位移。本地等级距离已在计算生物学和母语识别中显示出令人鼓舞的结果，但是用于计算LRD的算法在计算上却很昂贵。本文提出了一种有效的LRD算法。主要的效率改进是为比较的字符串之一中的字符n-gram建立位置倒排索引。然后，对于另一个字符串中的每个n-gram，使用二进制搜索来找到位置倒排索引中最匹配的n-gram的位置。所提出的算法比原始算法快两个数量级以上。本文还展示了所描述算法的应用。确实，已提供了最新的结果，可从文本文档中识别阿拉伯语。

著录项

来源
《International conference on neural information processing》|2015年|390-400|共11页
会议地点
作者
Radu Tudor Ionescu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Local Rank Distance; Rank distance; String kernel; String similarity; Character n-grams; Native language identification; Fast algorithm;

机译：当地等级距离;等级距离字符串内核;字符串相似度;字符n-gram;母语识别;快速算法;

相似文献

外文文献
中文文献
专利

1. Rank Distance with Applications in Similarity of Natural Languages [J] . Liviu P. Dinu Fundamenta Informaticae . 2005,第1a4期

机译：等级距离及其在自然语言相似性中的应用
2. Note from the Guest Editors: Special issue on Arabic Natural Language Processing and Speech Recognition: A study of algorithms, resources, tools, techniques, and commercial applications [J] . Mohammad A. M. Abushariah, Amy Neustein, Bassam H. Hammo International journal of speech technology . 2016,第2期

机译：来宾编辑的注释：阿拉伯自然语言处理和语音识别特刊：对算法，资源，工具，技术和商业应用的研究
3. Word sense disambiguation using evolutionary algorithms - Application to Arabic language [J] . Mohamed El Bachir Menai Computers in Human Behavior . 2014,第deca期

机译：使用进化算法的词义消歧-阿拉伯语应用
4. A Fast Algorithm for Local Rank Distance: Application to Arabic Native Language Identification [C] . Radu Tudor Ionescu International conference on neural information processing . 2015

机译：局部排名距离的快速算法：阿拉伯语母语识别的应用
5. Native Language Identification Using Phonetic Algorithms [D] . Smiley, Charese H. 2018

机译：使用拼音算法的母语语言识别
6. Developing fMRI protocol for clinical use Comparison of 6 Arabic paradigms for brain language mapping in native Arabic speakers [O] . Rafat S. Mohtasib, Jamaan S. Alghamdi, Salah M. Baz, 2021

机译：发育FMRI协议用于临床使用比较6阿拉伯语范式对原生阿拉伯语扬声器脑语语言映射的比较
7. Challenges of Distance Learning in Language Classes: Based on the Experience of Distance Teaching of Arabic to Non-native Speakers in Light of the Coronavirus Pandemic [O] . Dalal Mohd Al-Assaf 2021

机译：语言课程中远程学习的挑战：基于冠心病大流行的阿拉伯语与非母语人员远程教学的体验

A Fast Algorithm for Local Rank Distance: Application to Arabic Native Language Identification

摘要

著录项

相似文献

相关主题

期刊订阅