...
首页> 外文期刊>Science of Computer Programming >Mutable strings in Java: design, implementation and lightweight text-search algorithms
【24h】

Mutable strings in Java: design, implementation and lightweight text-search algorithms

机译:Java中的可变字符串:设计,实现和轻量级文本搜索算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The Java string classes, String and StringBuffer, lie at the extremes of a spectrum (immutable, reference based, and mutable, content based). Analogously, available text-search methods on string classes are implemented either as trivial, brute-force double loops, or as very sophisticated and resource-consuming regular-expression search methods. Motivated by our experience in data-intensive text applications, we propose a new string class, MutableStr ing, which tries to get the right balance between extremes in both cases. Mutable strings can be in one of two states, compact and loose, in which they behave more like String and StringBuffer, respectively. Moreover, they support a wide range of sophisticated text-search algorithms with a very low resource usage and set-up time, using a new, very simple randomised data structure (a generalisation of Bloom filters) that stores an approximation from above of a lattice-valued function. Computing the function value requires a constant number of steps, and the error probability can be balanced with space usage. As a result, we obtain practical implementations of Boyer-Moore type algorithms that can be used with very large alphabets, such as Unicode collation elements. The techniques we develop are very general and amenable to a wide range of applications.
机译:Java字符串类String和StringBuffer处于频谱的极端(不可变的,基于引用的和可变的,基于内容的)。类似地,在字符串类上可用的文本搜索方法实现为琐碎的暴力双循环,或者实现为非常复杂且消耗资源的正则表达式搜索方法。基于我们在数据密集型文本应用程序方面的经验,我们提出了一个新的字符串类MutableStr ing,该方法试图在两种情况下的极端之间取得适当的平衡。可变字符串可以处于紧凑和松散两种状态之一,它们的行为分别类似于String和StringBuffer。此外,它们使用新的,非常简单的随机数据结构(布隆过滤器的通用化)来存储从网格上方的近似值,从而以非常低的资源使用和设置时间来支持各种复杂的文本搜索算法。值函数。计算函数值需要恒定的步数,并且错误概率可以与空间使用情况相平衡。结果,我们获得了可以与非常大的字母(例如Unicode归类元素)一起使用的Boyer-Moore类型算法的实际实现。我们开发的技术非常通用,可适用于广泛的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号