Inferring lexical and grammatical structure from sequences

机译：从序列中推断词汇和语法结构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In a wide variety of sequences from various sources, from musicand text to DNA and computer programs, two different but related kindsof structure can be discerned. First, some segments tend to be repeatedexactly, such as motifs in music, words or phrases in text, identifiersand syntactic idioms in computer programs. Second, these segmentsinteract with each other in variable but constrained ways. For example,in English text only certain syntactic word classes can appear after theword `the' many parts of speech (such as verbs) are necessarilyexcluded. This paper shows how these kinds of structure can be inferredautomatically from sequences. We begin with an example that bothillustrates the utility of inferring the kinds of structure we seek andshows what our techniques can do. Next we present an efficient andnon-obvious algorithm for identifying exact repetitions-including nestedrepetitions-in time which is linear with the length of the sequence.Then we describe a very simple algorithm for identifying interactionsbetween sequence elements. The focus of this paper is on how these twoalgorithms can work together, for their combination is far more powerfulthan either alone. We show how they combine to generate the kind ofstructure sought in the original motivating example. Although the twomethods work well together on many simple examples, the resultsfrequently conflict with intuition in the inference of branchingstructure. The minimum description length principle seems to provide theonly satisfactory general approach

机译：来自音乐的各种来源的各种音序和文本到DNA和计算机程序，两种不同但相关的类型结构的辨别力。首先，某些片段倾向于重复确切地说，例如音乐中的主题，文本中的单词或短语，标识符和计算机程序中的句法习语。二，这些细分以可变但受约束的方式彼此交互。例如，在英文文本中，仅某些句法词类可以出现在 “很多”词（例如动词）必不可少排除在外。本文展示了如何推断出这类结构自动从序列中提取。我们从一个例子开始说明了推断我们寻求的结构种类的实用性，以及展示了我们的技术可以做什么。接下来，我们介绍一个有效的用于识别精确重复（包括嵌套）的非显而易见算法重复时间与序列的长度成线性关系。然后，我们描述了一种用于识别交互的非常简单的算法序列元素之间。本文的重点是这两个方面算法可以协同工作，因为它们的组合功能更强大比任何一个单独。我们展示了它们如何结合以产生那种原始激励示例中寻求的结构。虽然两个方法可以在许多简单的示例中很好地协同工作，结果在分支推理中经常与直觉冲突结构体。最小描述长度原则似乎提供了只有令人满意的一般方法

著录项

来源
《Compression and Complexity of Sequences 1997》|1998年|p.265-274|共10页
会议地点
作者
Manning C.G.N.; Witten I.H.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Numerosity Structures the Expression of Quantity in Lexical Numbers and Grammatical Number [J] . Overmann Karenleigh A. Current Anthropology: A World Journal of the Sciences of Man . 2015,第5期

机译：词性构成词汇数和语法数的数量表示
2. The acquisition of grammatical and lexical structures in children with cochlear implants: a developmental psycholinguistic approach. [J] . Szagun G Audiology & neuro-otology . 2000,第1期

机译：人工耳蜗儿童的语法和词汇结构习得：一种发展的心理语言学方法。
3. On the emergence of grammatical language as a means of bypassing the limitations of working memory capacity. Comment on "Interaction between lexical and grammatical language systems in the brain" by Alfredo Ardila. (Note) [J] . Coolidge F.L. Physics of life reviews . 2012,第2期

机译：关于语法语言的出现作为绕过工作记忆容量限制的一种手段。评论阿尔弗雷多·阿迪拉（Alfredo Ardila）的“大脑中的词汇和语法语言系统之间的相互作用”。（注意）
4. Inferring lexical and grammatical structure from sequences [C] . Manning, C.G.N., Witten, . 1998

机译：从序列中推断词汇和语法结构
5. Learning to speak Spanish 'con mama': A longitudinal study of the grammatical structure and lexical composition of early noun phrases. [D] . Schnell, Beatrice M. 2001

机译：学习讲西班牙语“ con mama”：纵向研究早期名词短语的语法结构和词汇组成。
6. Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms [O] . Bulgan Galbadrakh, Kyung-Eun Lee, Hyun-Seok Park 2012

机译：开发JSequitur以研究字符串压缩算法的语法推断框架中生物序列的层次结构
7. Inferring Lexical and Grammatical Structure from Sequences [O] . Craig G. Nevill-Manning, Ian H. Witten 1997

机译：从序列推断词汇和语法结构

Inferring lexical and grammatical structure from sequences

摘要

著录项

相似文献

相关主题

期刊订阅