The data needed to answer queries is often available through Web-based APIs. Indeed, for a given query there may be many Web-based sources which can be used to answer it, with the sources overlapping in their vocabularies, and differing in their access restrictions (required arguments) and cost. We introduce PDQ (Proof-Driven Query Answering), a system for determining a query plan in the presence of web-based sources. It is: (ⅰ) constraint-aware - exploiting relationships between sources to rewrite an expensive query into a cheaper one, (ⅱ) access-aware - abiding by any access restrictions known in the sources, and (ⅲ) cost-aware - making use of any cost information that is available about services. PDQ takes the novel approach of generating query plans from proofs that a query is answerable. We demonstrate the use of PDQ and its effectiveness in generating low-cost plans.
展开▼