This paper develops a query language for sequence databases, such as genome databases and text databases. The language, called Sequence Datalog, extends classical Datalog with interpreted function Symbols for manipulating sequences. It has both a clear operational and Declaratie semantics, based on a new notion called the extended Active domain of a database. The extended domain contains all the Sequences in the database and all their subsequences. This idea leads To a clear distinction between safe and unsafe recursion over sequen- Ces: safe recursion stays inside the extended active domain, while Unsafe recursion does not. By carefully limiting the amount of unsafe Recursion, the paper develops a safe and expressive subset of Sequence Datalog. As part of the development, a new type of transducer is intro- Duced, called a generalized sequence transducer.
展开▼