The paper develops EDITOR, a language for manipulating semistructured documents, such as those typically available on the Web. EDITOR programs are based on two simple ideas, taken from text editors: “search” instructions are used to select regions of interest in a document, and “cut & paste” instructions to restructure them. We study the expressive power and the com- plexity of these programs. We show that they are computationally complete, in the sense that any computable document restructuring can be expressed in EDITOR. We also study the complexity of a safe subclass of programs, show- ing that it captures exactly the class of polynomial-time restructurings. The language has been implemented in Java and is currently used in the ARANEUS project as a basis for a wrapper-generation toolkit.
展开▼