首页> 外文会议>Insternational Joint Conference on Natural Language Processing >Statistical Substring Reduction in Linear Time

【24h】

Statistical Substring Reduction in Linear Time

机译：线性时间的统计基因减少

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the problem of efficiently removing equal frequency n-gram substrings from an n-gram set, formally called Statistical Substring Reduction (SSR). SSR is a useful operation in corpus based multi-word unit research and new word identification task of oriental language processing. We present a new SSR algorithm that has linear time (O(n)), and prove its equivalence with the traditional O(n~2) algorithm. In particular, using experimental results from several corpora with different sizes, we show that it is possible to achieve performance close to that theoretically predicated for this task. Even in a small corpus the new algorithm is several orders of magnitude faster than the O(n~2) one. These results show that our algorithm is reliable and efficient, and is therefore an appropriate choice for large scale corpus processing.

机译：我们研究了从N-GRAM集中有效地去除等频率n-gram子串的问题，正式称为统计基板减少（SSR）。 SSR是基于语料库的多字单元研究和东方语言处理的新单词标识任务的有用操作。我们介绍了一种具有线性时间的新SSR算法（O（n）），并通过传统的O（n〜2）算法证明其等价。特别是，使用具有不同尺寸的多个Cotora的实验结果，我们表明可以实现接近理论上预测的性能。即使在小语料库中，新算法也比O（n〜2）更快的数量级。

著录项

来源
《Insternational Joint Conference on Natural Language Processing 》|2004年||共6页
会议地点
作者
Xueqiang Lue; Le Zhang; Junfeng Hu; Association for Computational Linguistics(ACL); Association for Computational Linguistics and Chinese Language Processing(ACLCLP); Association of Natural Language Processing(ANLP);
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序语言、算法语言 ;
关键词

相似文献

外文文献
中文文献
专利

1. A simple yet time-optimal and linear-space algorithm for shortest unique substring queries [J] . Ileri Atalay Mert, Kulekci M. Oguzhan, Xu Bojian Theoretical computer science . 2015 ,第Null期

机译：一种简单但时间最优且线性空间算法，可用于最短的唯一子字符串查询
2. Average-case linear-time similar substring searching by the q-gram distance [J] . Hiroyuki Hanada, Mineichi Kudo, Atsuyoshi Nakamura Theoretical computer science . 2014 ,第Null期

机译：通过q-gram距离搜索平均情况的线性时间相似子字符串
3. Searching a Bitstream in Linear Time for the Longest Substring of Any Given Density [J] . Benjamin A. Burton Algorithmica . 2011 ,第3期

机译：在线性时间中搜索比特流以找到任何给定密度的最长子串
4. Statistical Substring Reduction in Linear Time [C] . Xueqiang Lue, Le Zhang, Junfeng Hu International Joint Conference on Natural Language Processing . 2005

机译：线性时间的统计基因
5. SENSITIVITY ANALYSIS, IDENTIFICATION, AND MODEL REDUCTION TECHNIQUES FOR SPECIAL CLASSES OF COMPLEX SYSTEMS (SINGULAR PERTURBATION, DISCRETE-TIME, LINEARIZATION, TWO-TIME-SCALE SYSTEM) [D] . FU, SHEAU-WEI JOHNNY 1985

机译：复杂系统特殊类别（奇摄动，离散时间，线性化，两时标度系统）的灵敏度分析，识别和模型简化技术
6. Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis [O] . Huiguang Yi, Yanling Lin, Chengqi Lin, 2021

机译：KSSD：k-mer子串空间采样的序列维数减少支持实时大规模数据集分析
7. Statistical Substring Reduction in Linear Time [O] . Xueqiang Lxq Pku, Xueqiang L U, Le Zhang, 2004

机译：线性时间中的统计子串减少
8. Decoupling and Order Reduction for Linear Time-Varying Two-Time-Scale Systems [R] . O'Malley, R. E., Anderson, L. R. 1980

机译：线性时变双时间尺度系统的解耦和阶数约简

Statistical Substring Reduction in Linear Time

摘要

著录项

相似文献

相关主题

期刊订阅