The central challenge of nowadays information society is to take advantage of the deluge of information that is available on the Web as well as in enterprise contexts.
In order to support enterprises, research institutions and individuals to facilitate the available information for their needs, the Research Group Data and Web Science conducts research on methods for managing, integrating and mining large-amounts of heterogeneous information within enterprise and open Web contexts.
We conduct research on the following topics:
- Data Integration
- Data Mining
- Text Analysis
- Ontologies and Inference
- Web Technologies
- User Interaction
- Data-intensive Applications
The Research Group Data and Web Science is organized in three focus areas, each covering a specific aspect of the group's overall research goal:
Focus Area: Artificial Intelligence
Focus Area: Artificial Intelligence
The focus area Artificial Intelligence, headed by Prof. Stuckenschmidt consists of three subgroups covering different areas of AI research.
Reasoning and Learning (Dr. Christian Meilicke)
The group is carrying out fundamental and applied research on the development and application of AI methods and tools to the problem of web data interpretation and management. The focus is on inductive and deductive reasoning for information extraction and -integration. The work ranges from logical reasoning using description logics and logic programming to statistical learning and inference methods in particular statistical relational learning and log-linear models. The group applies there reasoning methods to all kinds of data ranging from structured data to free texts. A special interest of the group is on distributed algorithms for large scale reasoning.
Language Technology and Information Retrieval (Prof. Simone Paolo Ponzetto)
The group is carrying out research on Knowledge acquisition from large text collections and collaboratively constructed resources and Natural Language Processing for Information Analysis and Retrieval. Areas of interest include lexical semantics (word sense disambiguation, knowledge-based and distributional approaches to semantic similarity); as well as discourse semantics and pragmatics (entity disambiguation and classification, coreference resolution, discourse coherence, automatic document summarization).The group applies NLP methods for supporting empirical research in Social Science and Humanities.
Linked Data and Information Mining (Dr. Johanna Völker)
The Group is concerned with the creation, management and improvement of semistructured data on the Web using techniques from data mining and computational linguistics. The current focus of the work is on schema induction and infromation debugging in the context of linked data. Application areas of the research includes smart web portals and linked data for data center management.
The focus area Web-based Systems, headed by Prof. Dr. Christian Bizer, explores technical and economic questions concerning the development of global, decentralized information environments. The current main topics of our research are:
- The evolution of the World Wide Web from a medium for the publication of textual documents into a medium for sharing structured data, as well as the role of Linked Data technologies within this transition.
- The shift from classic data integration architectures to Enterprise Data Spaces which provide for the flexible, pay-as-you-go integration of large numbers of internal and external data sources.
We contribute to various open data publishing and open source software projects including the W3C Linking Open Data (LOD) project which coordinates the extension of the Web with a global data space by publishing open-license datasets as RDF on the Web and by setting data links between items within different data sources; the DBpedia project which extracts a large multilingual knowledge base from Wikipedia; and the Web Data Commons project which extracts structured information from over 1.5 billion web pages. Our open source software projects include the LDIF – Linked Data Integration Framework , the Silk – Link Discovery Framework, as well as the D2RQ Platform.
We participate in the following third-party funded research projects: LOD2 - Creating Knowledge out of Interlinked Data, a EU-FP7-IP project which develops tools and methodologies for exposing and managing large amounts of structured data on the Web; the DM2E –Digitized Manuscripts to Europeana project which develops an interoperability infrastructure for converting library metadata from a diverse range of source formats into the Europeana Data Model (EDM), and the PlanetData Network of Excellence which establishes a European community of researchers around the topic of large-scale data management.
The focus area Media Informatics is headed by Juniorprof. Dr. habil. Ansgar Scherp. Our work is at the intersection of processing and managing large-scale semantic data and human-centered (multimedia) applications. A central problem that hinders the adoption of semantic technologies and semantic applications is the lack in a proper usability of such applications, in particular when dealing with very large and distributed semantic data that is of different origin and quality (aka Big Data).
We develop novel methods and tools for an efficient processing and management of large-scale semantic data and develop interactive, knowledge-based applications to ease use of that large-scale data. We evaluate the knowledge-based applications by conducting formative and summative user studies and applying statistical methods. Examples of this benefitial combination of large-scale semantic data management and human-centered applications are the comparison of a semantic desktop vs. a standard desktop for personal information management (KCAP '09), the interactive exploration of large-scale semantic social media data with SemaPlorer (JWS '09), deriving semantics from photo books to improve the authoring and retrieval task (ACM MM '07), interactive exploration of distributed, social media sources (MTAP '13), and the interactive user guidance when searching for relevant sources of Linked Open Data at web-scale (JWS '12, KCAP '13).
In detail, the research interests are:
- Processing and managing large-scale semantic data
- Interactive knowledge-based (multimedia) applications
- Interactive semantic mobile (multimedia) applications
Juniorprofessor: Ansgar Scherp
PhD Student: Lydia Weiland