CENG 641

Data Integration

This course will begin by explaining the different reasons that make data integration hard: systems, logical and social reasons. The course will then cover the fundamentals of data integration, such as languages for resolving heterogeneity, automatic schema mapping techniques, query processing in heterogeneous systems and novel architectures for data integration. Finally, the commercial state of the art in this area and data spaces will be discussed.

Course Objectives

Today, organizations are in need of accessing information from heterogeneous data sources through a single uniform interface. This course discusses the basics and main challenges in data integration together with the state of art technologies and future directions.

Recommended or Required Reading

J.M. HELLERSTEIN, M.J. FRANKLIN, S. CHNADRASEKARAN, A. DESHPANDE, K. HILDRUM, S. MADDEN, V. RAMANA and M.A. SHAH; Adaptive Query Processing: Technology in Evolution, IEEE Data Engineering Bulletin, vol. 23, no. 2, 2000. ,Y.E. IOANNIDIS and E. WONG; Query Optimization by Simulated Annealing, Proc. of ACM STGMOD Conf., 1987. ,O. M. DUSCHKA; Query Planning and Optimization in Information Integration. Ph.D thesis, Stanford University, Stanford, Calfornia, 1998. ,M.J. FRANKLIN, B.T. JONSSON and D. KOSSMANN; Performance Tradeoffs for Client-Server Query Processing, SIGMOD Conference, 1998, pp. 9-18. ,R. GOLDMAN and J. WIDOM; WSQ/DSQ: A Practical Approach for Combined Querying of Databases and the Web. Proc. of ACM SIGMOD Conf., 2000. ,L.M. HAAS, D.KOSSMANN, E.L. WIMMERS and J.Y. YANG; Optimizing Queries Across Diverse Data Sources, Proc. of the VLDB Conference, 1997, pp.276-285. ,S. CHAUDHURI and K. SHIM; Query optimization in the presence of foreign functions, Proc. of the VLDB, 1993. ,A.Y. HALEVY; Answering Queries Using Views: A Survey, VLDB Journal, 2001, pp. 270-294. ,Z. IVES, A. HALEVY and D. WELD; Adapting to Source Properties in Processing Data Integration Queries, Proc. of ACM SIGMOD International Conference on Management of Data, June, 2004. ,C. LI; Computing Complete Answers to Queries in the Presence of Limited Access Patterns, The VLDB Journal, Vol. 12, 2003, pp. 211-227. ,L.F. MACKERT and G.M. LOHMAN; R* Optimizer Validation and Performance Evaluation for Distributed Queries, Proc. of the 12th Int. Conf. On VLDB, 1986 pp. 149-159. ,A. RAJARAMAN and Y. SAGIV, J. ULLMAN; Answering Queries Using Templates with Binding Patterns. Proc. of ACM PODS, San Jose, CA, 1995. ,I. MANOLESCU, L. BOUGANIM, F. FABRET and E. SIMON; Efficient Querying of Distributed Resources in Mediator Systems. Proc. of the Confederated International Conferences DOA, CoopIS and ODBASE, LNCS 2519, Springer-Verlag, 2002, pp. 468 – 485. ,T. OZSU and P. VALDURIEZ; Principles Of Distributed Database Systems, Prentice Hall, 2013.

Learning Outcomes

1. To learn the underlying approaches and methods behind all functionalities of a data integration system

2.To acquire the ability to read, comprehend, and discuss key publications in the field of data integration

3.To develop solution proposals for a research problem in the domain of data integration

4.To implement the proposed solution for the selected research problem, evaluate its performance, and present the findings in both written and oral forms as a technical report

Topics
What is Data Integration?
Challenges in Data Integration
Modelling Data Sources
Modelling Data Sources
Automatic Schema Mapping
Query Processing in Data Integration Systems
Query Processing in Data Integration Systems
Query Optimization in Data Integration Systems
Query Optimization in Data Integration Systems
Architectures for Data Integration
Architectures for Data Integration
Dataspaces
Commercial Data Integration Systems
Evolution of Data Integration Systems