Web Data Integration (HWS2015)
Data integration is one of the key challenges within most IT projects. Within the enterprise context, data integration problems arise whenever data from separate sources needs to be combined as the basis for new applications. Within the context of the Web, data integration techniques form the foundation for taking advantage of the ever growing number of publicly-accessible data sources and for enabling applications such as product comparison portals, location-based mashups, and data search engines.
In the course, students will learn techniques for integrating and cleansing data from large sets of heterogeneous data sources. The course will cover the following topics:
- Heterogeneity and Distributedness
- The Data Integration Process
- Structured Data on the Web
- Data Exchange Formats
- Schema Mapping and Data Translation
- Identity Resolution
- Data Quality Assessment
- Data Fusion
The course consists of a lecture together with accompanying practical exercises. In the exercises the participants will gather expertise in applying state of the art data integration techniques along the case study of a real-world Web data integration project. Students will work on their projects in teams and will report about the results of their projects in the form of a written report as well as an oral presentation.
Next Web Data Integration Course
Prof. Bizer is on sabbatical (Forschungssemester) in HWS2016. Thus, the Web Data Integration course is not offered in HWS2016. The course will be offered the next time in HWS2017.
Time and Location
- Wednesday, 15:30-17:00. Building: B6, Room: A 101 (Starting: 9.9.2015)
- Thursday, 10:15-11:45. Building: B6, Room: A 101 (Starting: 10.9.2015)
- Please register for the course via the ILIAS system.
- 50 % written exam
- 50 % project work
- Slides: Introduction and Course Outline
- Slides: Structured Data on the Web
- Slides: Data Exchange Formats - Part 1
- Slides: Data Exchange Formats - Part 2
- Slides: Schema Mapping and Data Translation
- Slides: Introduction to Student Projects
- Slides: Introduction to MapForce
- Slides: Identity Resolution
- Slides: Project Phase 2: Identity Resolution
- Java Project Template: Identity Resolution
- Slides: Data Quality Assessment and Data Fusion
- Slides: Project Phase 3: Data Fusion
- Java Project Template: Data Fusion
- Video recordings of the Web Data Integration lectures are available here.
- The course is open to students of the Master Business Informatics
- The course is restricted to 30 participants
- Students can register by joining the ILIAS group.
- Basic programming skills in Java are required for the exercise
|7.9.2015||Lecture: Introduction to Web Data Integration||Lecture: Structured Data on the Web|
|14.9.2015||Lecture: Data Exchange Formats||Lecture: Data Exchange Formats|
|21.9.2015||Lecture: Schema Mapping||Lecture: Schema Mapping|
|28.9.2015||Exercise: Introduction to Student Projects||Lecture: Introduction to MapForce|
|5.10.2015||Feedback about Project Outlines||Exercise: Data Translation|
|12.10.2015||Exercise: Data Translation||Exercise: Data Translation|
|19.10.2015||Lecture: Identity Resolution||Lecture: Identity Resolution|
|26.10.2015||Exercise: Identity Resolution||Exercise: Identity Resolution|
|2.11.2015||Exercise: Identity Resolution||Exercise: Identity Resolution|
|9.11.2015||Lecture: Data Quality and Data Fusion||Lecture: Data Quality and Data Fusion|
|16.11.2015||Exercise: Data Fusion||Exercise: Data Fusion|
|23.11.2015||Exercise: Data Fusion||Exercise: Data Fusion|
|30.11.2015||Exercise: Data Fusion||Exercise: Data Fusion|
|7.12.2015||Presentation of project results||Presentation of project results|
- HWS2015 results of the evaluation of the course by the participants.
- HWS2014 results of the evaluation of the course by the participants.
- HWS2013 results of the evaluation of the course by the participants.
- AnHai Doan, Alon Halevy, Zachary Ives: Principles of Data Integration. Morgan Kaufmann, 2012.
- Ulf Leser, Felix Naumann: Informationsintegration. Dpunkt Verlag, 2007.
- Luna Dong, Divesh Srivastava: Big Data Integration. Morgan & Claypool, 2015.
- Serge Abiteboul, et al: Web Data Management. Cambridge University Press, 2012.
- Jérôme Euzenat, Pavel Shvaiko: Ontology Maching. Springer, 2007.
- Felix Naumann: An Introduction to Duplicate Detection. Morgan & Claypool, 2012.
- Peter Christen: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, 2012.