The central challenge of nowadays information society is to take advantage of the deluge of information that is available on the Web as well as in enterprise contexts.

In order to support enterprises, research institutions and individuals to facilitate the available information for their needs, the Research Group Data and Web Science conducts research on methods for managing, integrating and mining large-amounts of heterogeneous information within enterprise and open Web contexts.

Focus Areas

We conduct research in the following Focus Areas:

The group carries out research on methods for mining knowledge from large amounts of structured and unstructured data on the Web. In order to address the challenges of Web-scale data mining in terms of the size, heterogeneity, and dynamics of the data, a focus is on supervised or unsupervised methods that combine logical and statistical inference. Existing and automatically acquired knowledge is used to facilitate data integration, enrichment, and cleansing, as well as to bootstrap the overall data mining process. More information about this focus area.

Co-Heads: Dr. Heiko Paulheim, Dr. Johanna Völker
Members: Daniel Fleischhacker, Andre Melo, Oliver Lehmberg, Robert Meusel, Petar Ristoski, Max Schmachtenberg


The group is carrying out fundamental and applied research on the development and application of AI methods and tools to the problem of web data interpretation and management. The focus is on inductive and deductive reasoning for information extraction and -integration. The work ranges from logical reasoning using description logics and logic programming to statistical learning and inference methods in particular statistical relational learning and log-linear models. The group applies reasoning methods to all kinds of data ranging from structured data to free texts. A special interest of the group is on distributed algorithms for large scale reasoning.


Head:Dr. Christian Meilicke
Members: Erman Acar, Arnab Dutta, Rim Helaoui, Jan Noessner, Jörg Schönfisch

  • ZIM Project: Risk Management in Data Centers
  • Linking and Populating the Digital Humanities
  • Google Research Grant: Web Scale Information Extraction

The group conducts research on knowledge acquisition from heterogeneous Web sources – ranging from large raw text collections all the way through collaboratively constructed resources (e.g., Wikipedia) – and its application to Natural Language Processing (NLP), Information Analysis and Retrieval. Areas of interest include “deep” NLP techniques for lexical semantics (Word Sense Disambiguation, ontology-based and distributional approaches to semantic similarity), as well as for document understanding and structuring (entity linking, co-reference resolution, discourse coherence, automatic summarization). The group applies NLP methods to support empirical research in Social Science and Humanities. More information about this focus area.

Head: Prof. Dr. Simone Paolo Ponzetto
Members: Johannes Knopp, Michael SchuhmacherLydia Weiland, Cäcilia Zirn

  • Research and Science Center: Trust in Web Reviews
  • SFB 884: Political Economy of Reforms
  • Research Data Service Center

The group is carrying out research on all aspects of the Linked Open Data life cycle, including data alignment, interlinking, and fusion. More specific topics are schema mapping, data enrichment, mapping unstructured to structured data, identity resolution, data provenance, and data quality assessment. The group applies the developed methods for integrating and cleaning Linked Data from the Web in the domains of digital humanities, social sciences, cultural heritage data, and public government data.

Co-Heads: Dr. Volha Bryl, Dr. Kai Eckert
Members: Petar Petrovski, Dominique Ritze