【24h】

A two-stage codebook building method using fast WAN

机译:使用快速WAN的两阶段代码簿构建方法

获取原文

摘要

Pattern-matching based document compression systems rely on finding a small set of patterns that can be used to represent the whole document. When analyzing and comparing this kind of system two factors have to be considered: the compression rate attained and the speed and associated complexity of the codebook building. In order to reduce the computational burden of the pattern matching operation while keeping a good compression ratio, we propose a new fast algorithm to carry out a WAN (weighted AND-NOT) matching process. Thus, codebook building is performed in two stages: the first step is based on FWAN (fast WAN) with a loose threshold; in the second one a more accurate but slower method (CTM, EPM) is applied over the initial approximate codebook. This screening greatly reduces the search space for the clustering procedure implicit in obtaining the library without altering the compression ratio. Experimental results show a very good speed performance for this new algorithm: at least three times faster than the usual WAN.
机译:基于模式匹配的文档压缩系统依赖于找到一小套可用于表示整个文档的模式。在分析和比较这种系统时,必须考虑两个因素:获得的压缩率以及码本构建的速度和相关的复杂性。为了减轻模式匹配操作的计算负担,同时又保持良好的压缩比,我们提出了一种新的快速算法来进行WAN(加权AND-NOT)匹配过程。因此,代码簿的构建分两个阶段进行:第一步是基于具有宽松阈值的FWAN(快速WAN);第二步是基于FWAN(快速WAN)。在第二种方法中,在初始近似码本上应用了一种更准确但速度较慢的方法(CTM,EPM)。这种筛选极大地减少了在不更改压缩率的情况下隐式获得群集时聚类过程的搜索空间。实验结果表明,这种新算法的速度性能非常好:至少比普通WAN快三倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号