首页> 外文会议>Compression and Complexity of Sequences 1997 >Inferring lexical and grammatical structure from sequences
【24h】

Inferring lexical and grammatical structure from sequences

机译:从序列中推断词汇和语法结构

获取原文

摘要

In a wide variety of sequences from various sources, from musicand text to DNA and computer programs, two different but related kindsof structure can be discerned. First, some segments tend to be repeatedexactly, such as motifs in music, words or phrases in text, identifiersand syntactic idioms in computer programs. Second, these segmentsinteract with each other in variable but constrained ways. For example,in English text only certain syntactic word classes can appear after theword `the' many parts of speech (such as verbs) are necessarilyexcluded. This paper shows how these kinds of structure can be inferredautomatically from sequences. We begin with an example that bothillustrates the utility of inferring the kinds of structure we seek andshows what our techniques can do. Next we present an efficient andnon-obvious algorithm for identifying exact repetitions-including nestedrepetitions-in time which is linear with the length of the sequence.Then we describe a very simple algorithm for identifying interactionsbetween sequence elements. The focus of this paper is on how these twoalgorithms can work together, for their combination is far more powerfulthan either alone. We show how they combine to generate the kind ofstructure sought in the original motivating example. Although the twomethods work well together on many simple examples, the resultsfrequently conflict with intuition in the inference of branchingstructure. The minimum description length principle seems to provide theonly satisfactory general approach
机译:来自音乐的各种来源的各种音序 和文本到DNA和计算机程序,两种不同但相关的类型 结构的辨别力。首先,某些片段倾向于重复 确切地说,例如音乐中的主题,文本中的单词或短语,标识符 和计算机程序中的句法习语。二,这些细分 以可变但受约束的方式彼此交互。例如, 在英文文本中,仅某些句法词类可以出现在 “很多”词(例如动词)必不可少 排除在外。本文展示了如何推断出这类结构 自动从序列中提取。我们从一个例子开始 说明了推断我们寻求的结构种类的实用性,以及 展示了我们的技术可以做什么。接下来,我们介绍一个有效的 用于识别精确重复(包括嵌套)的非显而易见算法 重复时间与序列的长度成线性关系。 然后,我们描述了一种用于识别交互的非常简单的算法 序列元素之间。本文的重点是这两个方面 算法可以协同工作,因为它们的组合功能更强大 比任何一个单独。我们展示了它们如何结合以产生那种 原始激励示例中寻求的结构。虽然两个 方法可以在许多简单的示例中很好地协同工作,结果 在分支推理中经常与直觉冲突 结构体。最小描述长度原则似乎提供了 只有令人满意的一般方法

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号