...
首页> 外文期刊>Theoretical computer science >USING MULTISET DISCRIMINATION TO SOLVE LANGUAGE PROCESSING PROBLEMS WITHOUT HASHING
【24h】

USING MULTISET DISCRIMINATION TO SOLVE LANGUAGE PROCESSING PROBLEMS WITHOUT HASHING

机译:使用多集区分解决不散列的语言处理问题

获取原文
获取原文并翻译 | 示例
           

摘要

It is generally assumed that hashing is essential to solve many language processing problems efficiently; e.g. symbol table formation and maintenance, grammar manipulation, basic block optimization, and global optimization. This paper questions this assumption, and initiates development of an efficient alternative compiler methodology without hashing or sorting. The methodology rests on efficient solutions to the basic problem of detecting duplicate values in a multiset, which we call multiset discrimination. Paige and Tarjan (1987) gave an efficient solution to multiset discrimination for detecting duplicate elements occurring in a multiset of varying length strings. The technique was used to develop an improved algorithm for lexicographic sorting, whose importance stems largely from its use in solving a variety of isomorphism problems (Aho et al., 1974), The current paper and a related paper (Paige, 1994) show that full lexicographic sorting is not needed to solve these isomorphism problems, because they can be solved more efficiently using straightforward extensions to the simpler multiset discrimination technique. By reformulating language processing problems in terms of multiset discrimination, we also show how almost every subtask of compilation can be solved without hashing in worst case running time no worse (and frequently better) than the best previous expected time solution (under the assumption that one hash operation takes unit expected time), Because of their simplicity, our solutions may be of practical as well as theoretical interest. The various applications presented culminate with a new algorithm to solve iterated strength reduction folded with useless code elimination that runs in worst case asymptotic time and auxiliary space Theta(/L/ + /L'/), where /L/ and /L'/ represent the lengths of the initial and optimized programs, respectively. The previous best solution due to Cocke and Kennedy (1977) takes Omega(/L/(3)/L'/) has operations in the worst case. [References: 41]
机译:通常认为,散列对于有效解决许多语言处理问题至关重要。例如符号表的形成和维护,语法操作,基本块优化和全局优化。本文质疑这一假设,并着手开发一种无需哈希或排序的高效替代编译器方法。该方法基于有效的解决方案,以解决检测多集中重复值的基本问题,我们称之为多集区分。 Paige和Tarjan(1987)为多集判别提供了一种有效的解决方案,用于检测出现在可变长度字符串的多集中的重复元素。该技术被用于开发一种改进的词典编目排序算法,其重要性很大程度上源于其在解决各种同构问题中的用途(Aho等人,1974年),当前论文和相关论文(Paige,1994年)表明:不需要完整的字典分类法来解决这些同构问题,因为可以使用简单的多集判别技术的直接扩展来更有效地解决它们。通过以多集区分的方式重新表达语言处理问题,我们还展示了几乎所有的编译子任务都可以在不进行散列的情况下解决,而在最坏的情况下运行时间不会比以前的最佳预期时间解决方案差(并且常常更好)(假设哈希运算需要花费单位预期时间),由于其简单性,我们的解决方案可能具有实际意义和理论意义。提出的各种应用最终以一种新算法解决了迭代强度降低的问题,该算法通过无用代码消除折叠起来,在最坏情况下的渐近时间和辅助空间Theta(/ L / + / L'/),其中/ L /和/ L'/分别代表初始程序和优化程序的长度。由于Cocke和Kennedy(1977)提出的先前最佳解决方案采用了Omega(/ L /(3)/ L'/)在最坏的情况下运行。 [参考:41]

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号