RSS-Feed en-gb TYPO3 News Wed, 21 Nov 2018 06:52:14 +0000 Wed, 21 Nov 2018 06:52:14 +0000 TYPO3 EXT:news news-2164 Thu, 04 Oct 2018 20:27:13 +0000 New DFG Project on joining graph- and vector-based sense representations for semantic end-user information access https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/new-dfg-project-on-joining-graph-and-vector-based-sense-representations-for-semantic-end-user-infor/ We are happy to announce that the Deutsche Forschungsgemeinschaft accepted our proposal for extending a joint research project on hybrid semantic representations together with our friends and colleagues of the Language Technology Group of the University of Hamburg.

The project, titled "Joining graph- and vector-based sense representations for semantic end-user information access" (JOIN-T 2) builds upon and aims at bringing our JOIN-T project (also funded funded by DFG) one step forward. Our vision for the next three years is to explore ways to produce semantic representations that combine the interpretability of manually crafted resources and sparse representations with the accuracy and high coverage of dense neural embeddings.

Stay tuned for forthcoming research papers and resources!

Research Topics - Artificial Intelligence (NLP) Simone
news-2166 Thu, 04 Oct 2018 08:52:00 +0000 WInte.r Web Data Integration Framework Version 1.3 released https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/winter-web-data-integration-framework-version-13-released/ We are happy to announce the release of Version 1.3 of the Web Data Integration Framework (WInte.r).

WInte.r is a Java framework for end-to-end data integration. The framework implements a wide variety of different methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation. The methods are designed to be easily customizable by exchanging pre-defined building blocks, such as blockers, matching rules, similarity functions, and conflict resolution functions.

The following features have been added to the framework for the new release:

  • Value Normalization: New ValueNormaliser class for normalizing quantifiers and units of measurement. New DataSetNormaliser class for detecting data types and transform complete datasets into a normalised base format.
  • External Rule Learning: In addition to learning matching rules directly inside of WInte.r, the new release also supports learning matching rules using external tools such as Rapidminer and importing the learned rules back into WInte.r.
  • Debug Reporting: The new release features detailed reports about the application of matching rules, blockers, and data fusion methods which lay the foundation for fine-tuning the methods.
  • Step-by-Step Tutorial: In order to get users started with the framework, we have written a step-by-step tutorial on how to use WInte.r for identity resolution and data fusion and how to debug and fine-tune the different steps of the integration process.

The WInte.r famework forms a foundation for our research on large-scale web data integration. The framework is used by the T2K Match algorithm for matching millions of Web tables against a central knowledge base, as well as within our work on Web table stitching for improving matching quality. The framework is also used in the context of the DS4DM research project for matching tabular data for data search.

Beside of being used for research, we also use the WInte.r famework for teaching. The students of our Web Data Integration course use the framework to solve case studies and implement their term projects.  

Detailed information about the WInte.r framework is found at

The WInte.r framework can be downloaded from the same web site. The framework can be used under the terms of the Apache 2.0 License.

Lots of thanks to Alexander Brinkmann and Oliver Lehmberg for their work on the new release as well as on the tutorial and extended documentation in the WInte.r wiki.

Research - Web-based Systems Chris Projects
news-2151 Tue, 28 Aug 2018 12:34:04 +0000 Paper accepted at EMNLP 2018 https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-at-emnlp-2018/ Our long paper submission

"Investigating the Role of Argumentation in the Rhetorical Analysis of Scientific Publications with Neural Multi-Task Learning Models " (Anne Lauscher, Goran Glavaš, Kai Eckert, and Simone Paolo Ponzetto)

got accepted at the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), one of the top-tier conferences in natural language processing!

Simone Publications Kai Topics - Artificial Intelligence (NLP) Research - Data Analytics
news-2150 Fri, 24 Aug 2018 14:27:52 +0000 André Melo has defended his PhD thesis https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/andre-melo-has-defended-his-phd-thesis/ André Melo has defended his PhD thesis on "Automatic Refinement of Large-Scale Cross-Domain Knowledge Graphs", supervised by Prof. Heiko Paulheim.

In his thesis, André has developed different methods to improve large-scale, cross-domain knowledge graphs along various dimensions. His contributions include, among others, a benchmarking suite for knowledge graph completion and correction, an effective method for type prediction using hierarchical classification, and a machine-learning based method for detection wrong relation assertions. Moreover, he has proposed methods for error correction in knowledge graph, and for distilling high-level tests from individual errors identified.

As of September, André will start a new job as a knowledge engineer for Babylon Health in London. We wish him all the best!

Group Research
news-2131 Wed, 11 Jul 2018 16:36:52 +0000 Paper accepted at ISWC 2018: Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-at-iswc-2018-fine-grained-evaluation-of-rule-and-embedding-based-systems-for-knowle/ The paper "Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion" by Christian Meilicke, Manuel Fink, Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, and Heiner Stuckenschmidt has been accepted at the 2018 International Semantic Web Conference (ISWC).

Over the recent years, embedding methods have attracted increasing focus as a means for knowledge graph completion. Similarly, rule-based systems have been studied for this task in the past. What is missing so far is a common evaluation that includes more than one type of method. We close this gap by comparing representatives of both types of systems in a frequently used evaluation protocol. Leveraging the explanatory qualities of rule-based systems, we present a fine-grained evaluation that gives insight into characteristics of the most popular datasets and points out the different strengths and shortcomings of the examined approaches. Our results show that models such as TransE, RESCAL or HolE have problems in solving certain types of completion tasks that can be solved by a rule-based approach with high precision. At the same time, there are other completion tasks that are difficult for rule-based systems. Motivated by these insights, we combine both families of approaches via ensemble learning. The results support our assumption that the two methods complement each other in a beneficial way.

Publications Rainer
news-2124 Wed, 27 Jun 2018 06:03:30 +0000 Mannheim Students Score Second Place at Data Mining Cup https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/mannheim-students-score-second-place-at-data-mining-cup/ The Data Mining Cup is an annual data mining competition for students from all over the world. Since 2014, students from Mannheim take part in the competition as an integral part of the Data Mining 2 lecture, held by Prof. Paulheim. In the course of the competition, the students have to solve a data mining task based on real e-commerce data.

This year, the data was provided by an online sports apparel retailer, and the task was to predict the sellout date for individual articles. Students had six weeks time to develop their solution. In the course of the lecture, they worked in different teams and had regular discussions about solution approaches and results.

One of the student teams from Mannheim qualified for the final round of the 10 best teams in May and was invited to present their solution Berlin at the prudsys personalization & pricing summit. In the final ranking, they scored second out of 197 solutions in total. Overall, teams from 148 universities from 47 countries took part in the 2018 data mining cup.

The DWS group wants to congratulate the winning team:

  • Nele Ecker
  • Thilo Habrich
  • Andreea Iana
  • Adrian Kochsiek
  • Alexander Luetke
  • Laurien Theresa Lummer
  • Nils Richter
  • Fabian Oliver Schmitt

Picture: Members of the winnig team in Berlin. Left to right: Nele Ecker, Laurien Lummer, Adrian Kochsiek, Alexander Lütke
Picture credits: Data Mining Cup/prudsys AG

Group Research
news-2123 Fri, 22 Jun 2018 09:35:32 +0000 JCDL 2018 - Vannevar Bush Best Paper Award https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/jcdl-2018-vannevar-bush-best-paper-award/ Our paper "Entity-Aspect Linking: Providing Fine-Grained Semantics of Entities in Context" has recently won the Vannevar Bush best paper award at the 2018 Joint Conference on Digital Libraries (JCDL), the top conference in the field of digital libraries!

The work, coauthored by Federico Nanni, Simone Paolo Ponzetto and Laura Dietz, is part of a collaboration between the DWS group and the University of New Hampshire in the context of an Elite Post-Doc grant of the Baden-Württemberg Stiftung recently awarded from Laura.

Congratulations also to Myriam Traub, Thaer Samar, Jacco van Ossenbruggen and Lynda Hardman, who, with their work, share with us the 2018 best paper award!

Simone Publications Research
news-2119 Mon, 11 Jun 2018 13:23:27 +0000 Papers accepted at ACL 2018 https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/papers-accepted-at-acl-2018/ We have three papers to be presented at the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), the premier international conference on Computational Linguistics and Natural Language Processing.

Two short papers prepared in collaboration with our colleagues from the University of Cambridge, the University of Hamburg and the University of Oslo have been accepted at the main conference track:

One paper has been accepted at the 3rd Workshop on Representation Learning for NLP (RepL4NLP) hosted by ACL 2018:

  • Samuel Broscheit: Learning Distributional Token Representations from Visual Features.
Research Publications Simone
news-2143 Fri, 08 Jun 2018 15:22:00 +0000 Lydia Weiland has defended her PhD thesis https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/lydia-weiland-has-defended-her-phd-thesis-2/ Lydia Weiland has successfully defended her PhD thesis on "Understanding the Message of Images" today. Key contribution of the thesis is a knowledge-rich, graph-based approach to automatically capture the message (gist) of images using the content and structure of a knowledge base to bridge the gap between text and image understanding. Congratulations, Lydia!


We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that have previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image or image-text understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide- coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. Specifically, we approach the problem using a pipeline that: i) links detected object labels in the image and concept mentions in the caption to nodes of the knowledge base; ii) builds a semantic graph out of these ‘seed’ concepts; iii) applies a series of graph expansion and clustering steps on the original semantic graph to include additional concepts and topics within the semantic representation; iv) combines several graph-based and text-based features into a concept ranking model that pinpoints the gist concepts. Understanding the gist can be useful for tasks, such as image search and recommending images for texts.

As gist detection is a novel task, to the best of our knowledge, there is no dataset available. Thus, we create a dataset allowing for simultaneous evaluation of literal and non- literal image-caption pairs. The gold standard gist concepts are from a common knowledge base (Wikipedia) and the provided ranks are detailed with levels 0 to 5, which supports various benchmarking tasks, e.g., ranking according to different levels of granularity and classification. Furthermore, as our proposed gist detection pipeline touches on different research areas, we provide a detailed gold standard for each of our pipeline steps, such as entity linking or object detection in the images. Our gist detection pipeline is evaluated in a detailed ablation study, investigating aspects of twelve different research questions. These are elaborated in the evaluation section via human-assessment or cross-validation and provide detailed insights into the gist of image-caption pairs. Furthermore, we show in an end-to-end setting the feasibility of state-of-the-art methods combined with our gist-detection pipeline and point to future research directions.

Our experiments show that the candidate selection and ranking of gist concepts is a more difficult problem for non-literal image-caption pairs than for literal image-caption pairs. Furthermore, we demonstrate that using features and concepts from both modalities (image and caption) improves the performance for all types of pairs – a finding which is in line with results from research on multimodal approaches for other related tasks. Additionally, a feature ablation study shows the complementary nature and usefulness of different types of features, which are collected from different kinds of semantic graphs of increasing richness. Finally, we experimented with a state-of-the-art image object detector and caption generator to evaluate the performance of an end-to-end solution for our task. The results indicate that using state-of-the-art open-domain image understanding provides us with an input that is good enough to detect gist concepts of image-caption pairs, with nearly half of the predicted gist concepts being relevant. However, it also demonstrates that improved object detectors could avoid a drop of 38% mean-average precision. Additionally, the caption contains useful hints especially for non-literal pairs.

Gist image identification is a small, yet arguably crucial part of the much bigger task of interpreting images beyond their denotation. Within a use case scenario of an established research problem, we show that gist detection in the form of concept ranking is useful for downstream tasks such as multimedia indexing, in that it outperforms shallow and deep approaches. Finally, we conclude that it could be useful also for image search and recommendation.

Group Research
news-2108 Mon, 07 May 2018 06:56:32 +0000 Roche Hypo University Challenge won by DWS-AI https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/roche-hypo-university-challenge-won-by-dws-ai/ We are happy to announce that Jakob Huber and Timo Sztyler reached the 1st place in the Hypo University Challenge that was hosted by Roche Diabetes Care GmbH and powered by IBM. The goal of the challenge was to develop an algorithm that predicts the probability for a nocturnal hypoglycemic event (severe, mild, hypo) in the upcoming 10, 20, 30, 40, and 60 minutes.


Today, more than 425 million people have Diabetes Mellitus, a metabolic disorder characterized by an increased blood sugar level. Keeping this untreated can lead to a hyperglycemia which results in confusion, abdominal pain, and coma. The treatment of diabetes lasts as long as life, i.e., there is no cure.


After the challenge, they were invited to present their solution approach as part of the Roche internal "Diagnostics R&D Fair" in Basel where they also received a trophy for winning the challenge.