The Data and Web Science Group is hosting the Data Science Conference LWDA 2018 in Mannheim on August 22-24, 2018.

LWDA, which expands to „Lernen, Wissen, Daten, Analysen“ („Learning, Knowledge, Data, Analytics“), covers recent research in areas such as knowledge discovery, machine learning & data mining, knowledge management, database management & information systems, information retrieval. 

The LWDA conference is organized by and brings together the various special interest groups of the Gesellschaft für Informatik (German Computer Science Society) in this area. The program comprises of joint research sessions and keynotes as well as of workshops organized by each special interest group.

Further information can be found on the conference website: https://www.uni-mannheim.de/lwda-2018/.

Download the conference poster.

news-2131 Wed, 11 Jul 2018 16:36:52 +0000 Paper accepted at ISWC 2018: Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-at-iswc-2018-fine-grained-evaluation-of-rule-and-embedding-based-systems-for-knowle/ The paper "Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion" by Christian Meilicke, Manuel Fink, Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, and Heiner Stuckenschmidt has been accepted at the 2018 International Semantic Web Conference (ISWC).

Over the recent years, embedding methods have attracted increasing focus as a means for knowledge graph completion. Similarly, rule-based systems have been studied for this task in the past. What is missing so far is a common evaluation that includes more than one type of method. We close this gap by comparing representatives of both types of systems in a frequently used evaluation protocol. Leveraging the explanatory qualities of rule-based systems, we present a fine-grained evaluation that gives insight into characteristics of the most popular datasets and points out the different strengths and shortcomings of the examined approaches. Our results show that models such as TransE, RESCAL or HolE have problems in solving certain types of completion tasks that can be solved by a rule-based approach with high precision. At the same time, there are other completion tasks that are difficult for rule-based systems. Motivated by these insights, we combine both families of approaches via ensemble learning. The results support our assumption that the two methods complement each other in a beneficial way.

news-2124 Wed, 27 Jun 2018 06:03:30 +0000 Mannheim Students Score Second Place at Data Mining Cup https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/mannheim-students-score-second-place-at-data-mining-cup/ The Data Mining Cup is an annual data mining competition for students from all over the world. Since 2014, students from Mannheim take part in the competition as an integral part of the Data Mining 2 lecture, held by Prof. Paulheim. In the course of the competition, the students have to solve a data mining task based on real e-commerce data.

This year, the data was provided by an online sports apparel retailer, and the task was to predict the sellout date for individual articles. Students had six weeks time to develop their solution. In the course of the lecture, they worked in different teams and had regular discussions about solution approaches and results.

One of the student teams from Mannheim qualified for the final round of the 10 best teams in May and was invited to present their solution Berlin at the prudsys personalization & pricing summit. In the final ranking, they scored second out of 197 solutions in total. Overall, teams from 148 universities from 47 countries took part in the 2018 data mining cup.

The DWS group wants to congratulate the winning team:

  • Nele Ecker
  • Thilo Habrich
  • Andreea Iana
  • Adrian Kochsiek
  • Alexander Luetke
  • Laurien Theresa Lummer
  • Nils Richter
  • Fabian Oliver Schmitt

Picture: Members of the winnig team in Berlin. Left to right: Nele Ecker, Laurien Lummer, Adrian Kochsiek, Alexander Lütke
Picture credits: Data Mining Cup/prudsys AG

news-2123 Fri, 22 Jun 2018 09:35:32 +0000 JCDL 2018 - Vannevar Bush Best Paper Award https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/jcdl-2018-vannevar-bush-best-paper-award/ Our paper "Entity-Aspect Linking: Providing Fine-Grained Semantics of Entities in Context" has recently won the Vannevar Bush best paper award at the 2018 Joint Conference on Digital Libraries (JCDL), the top conference in the field of digital libraries!

The work, coauthored by Federico Nanni, Simone Paolo Ponzetto and Laura Dietz, is part of a collaboration between the DWS group and the University of New Hampshire in the context of an Elite Post-Doc grant of the Baden-Württemberg Stiftung recently awarded from Laura.

Congratulations also to Myriam Traub, Thaer Samar, Jacco van Ossenbruggen and Lynda Hardman, who, with their work, share with us the 2018 best paper award!

news-2119 Mon, 11 Jun 2018 13:23:27 +0000 Papers accepted at ACL 2018 https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/papers-accepted-at-acl-2018/ We have three papers to be presented at the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), the premier international conference on Computational Linguistics and Natural Language Processing.

Two short papers prepared in collaboration with our colleagues from the University of Cambridge, the University of Hamburg and the University of Oslo have been accepted at the main conference track:

One paper has been accepted at the 3rd Workshop on Representation Learning for NLP (RepL4NLP) hosted by ACL 2018:

  • Samuel Broscheit: Learning Distributional Token Representations from Visual Features.
news-2143 Fri, 08 Jun 2018 15:22:00 +0000 Lydia Weiland has defended her PhD thesis https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/lydia-weiland-has-defended-her-phd-thesis-2/ Lydia Weiland has successfully defended her PhD thesis on "Understanding the Message of Images" today. Key contribution of the thesis is a knowledge-rich, graph-based approach to automatically capture the message (gist) of images using the content and structure of a knowledge base to bridge the gap between text and image understanding. Congratulations, Lydia!


We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that have previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image or image-text understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide- coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. Specifically, we approach the problem using a pipeline that: i) links detected object labels in the image and concept mentions in the caption to nodes of the knowledge base; ii) builds a semantic graph out of these ‘seed’ concepts; iii) applies a series of graph expansion and clustering steps on the original semantic graph to include additional concepts and topics within the semantic representation; iv) combines several graph-based and text-based features into a concept ranking model that pinpoints the gist concepts. Understanding the gist can be useful for tasks, such as image search and recommending images for texts.

As gist detection is a novel task, to the best of our knowledge, there is no dataset available. Thus, we create a dataset allowing for simultaneous evaluation of literal and non- literal image-caption pairs. The gold standard gist concepts are from a common knowledge base (Wikipedia) and the provided ranks are detailed with levels 0 to 5, which supports various benchmarking tasks, e.g., ranking according to different levels of granularity and classification. Furthermore, as our proposed gist detection pipeline touches on different research areas, we provide a detailed gold standard for each of our pipeline steps, such as entity linking or object detection in the images. Our gist detection pipeline is evaluated in a detailed ablation study, investigating aspects of twelve different research questions. These are elaborated in the evaluation section via human-assessment or cross-validation and provide detailed insights into the gist of image-caption pairs. Furthermore, we show in an end-to-end setting the feasibility of state-of-the-art methods combined with our gist-detection pipeline and point to future research directions.

Our experiments show that the candidate selection and ranking of gist concepts is a more difficult problem for non-literal image-caption pairs than for literal image-caption pairs. Furthermore, we demonstrate that using features and concepts from both modalities (image and caption) improves the performance for all types of pairs – a finding which is in line with results from research on multimodal approaches for other related tasks. Additionally, a feature ablation study shows the complementary nature and usefulness of different types of features, which are collected from different kinds of semantic graphs of increasing richness. Finally, we experimented with a state-of-the-art image object detector and caption generator to evaluate the performance of an end-to-end solution for our task. The results indicate that using state-of-the-art open-domain image understanding provides us with an input that is good enough to detect gist concepts of image-caption pairs, with nearly half of the predicted gist concepts being relevant. However, it also demonstrates that improved object detectors could avoid a drop of 38% mean-average precision. Additionally, the caption contains useful hints especially for non-literal pairs.

Gist image identification is a small, yet arguably crucial part of the much bigger task of interpreting images beyond their denotation. Within a use case scenario of an established research problem, we show that gist detection in the form of concept ranking is useful for downstream tasks such as multimedia indexing, in that it outperforms shallow and deep approaches. Finally, we conclude that it could be useful also for image search and recommendation.

news-2108 Mon, 07 May 2018 06:56:32 +0000 Roche Hypo University Challenge won by DWS-AI https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/roche-hypo-university-challenge-won-by-dws-ai/ We are happy to announce that Jakob Huber and Timo Sztyler reached the 1st place in the Hypo University Challenge that was hosted by Roche Diabetes Care GmbH and powered by IBM. The goal of the challenge was to develop an algorithm that predicts the probability for a nocturnal hypoglycemic event (severe, mild, hypo) in the upcoming 10, 20, 30, 40, and 60 minutes.


Today, more than 425 million people have Diabetes Mellitus, a metabolic disorder characterized by an increased blood sugar level. Keeping this untreated can lead to a hyperglycemia which results in confusion, abdominal pain, and coma. The treatment of diabetes lasts as long as life, i.e., there is no cure.


After the challenge, they were invited to present their solution approach as part of the Roche internal "Diagnostics R&D Fair" in Basel where they also received a trophy for winning the challenge.

news-2098 Tue, 17 Apr 2018 09:27:36 +0000 Paper accepted at IJCAI 2018 https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-at-ijcai-2018/ Together with our colleagues Paola, Irene and Stefano at Sapienza University in Rome we have a paper accepted at the 27th International Joint Conference on Artificial Intelligence (IJCAI), the premier conference in the field of AI:

  • Stefano Faralli, Irene Finocchi, Simone Paolo Ponzetto and Paola Velardi: Efficient Pruning of Large Knowledge Graphs.
news-2097 Tue, 17 Apr 2018 09:24:14 +0000 Paper accepted at JCDL 2018 https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-at-jcdl-2018/ We have a paper accepted at the 2018 Joint Conference on Digital Libraries (JCDL), the top conference in the field of digital libraries

  • Federico Nanni, Simone Paolo Ponzetto and Laura Dietz: Entity-Aspect Linking:  Providing Fine-Grained Semantics of Entities in Context.

The work presented in the paper is a collaboration between the DWS group and Prof. Laura Dietz at the University of New Hampshire in the context of an Elite Post-Doc grant of the Baden-Württemberg Stiftung recently awarded from Laura.



news-2096 Tue, 17 Apr 2018 09:08:19 +0000 Paper accepted at SIGIR 2018 https://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-at-sigir-2018/ Together with our colleague Ivan Vulic at the University of Cambridge we have a paper accepted at the 41st International ACM Conference on Research and Development in Information Retrieval (SIGIR), the premier conference in the field of Information Retrieval:

  • Robert Litschko, Goran Glavas, Ivan Vulic and Simone Paolo Ponzetto: Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only.
