We summarize progress on algorithms and software knowledge acquisition from large, distributed, autonomous, and semantically disparate information sources. Some key results include: scalable algorithms for constructing predictive models from data based on a novel decomposition of learning algorithms that interleaves queries for sufficient statistics from data with computations using the statistics; provably exact algorithms from distributed data (relative to their centralized counterparts); and statistically sound approaches to learning predictive models from partially specified data that arise in settings where the schema and the data semantics and hence the granularity of data differ across the different sources.
展开▼