Focus Group: Natural Language Processing and Information Retrieval (Prof. Ponzetto and Prof. Glavaš)

The NLP and IR group at DWS conducts research on integrating knowledge from heterogeneous Web sources – ranging from large raw text collections all the way through collaboratively constructed resources (e.g., Wikipedia) and knowledge bases (DBpedia, Freebase, etc.) – and its application to Natural Language Processing (NLP), Information Analysis and Retrieval tasks. Areas of interest include “deep” NLP techniques for text understanding, ranging from lexical and computational semantics (Word Sense Disambiguation, ontology-based and distributional meaning representations), over information extraction (entities and events), to document understanding and structuring (entity linking, ranking and search, automatic summarization). The group also applies NLP methods to support empirical research in Social Science and Humanities.





 *  joint project with the AI group

**  joint work with the Web-based Information Systems and Services @ HDM Stuttgart


  • SFB 884: Political Economy of Reforms
  • DFG Project JOIN-T: Joining Ontologies and semantics INduced from Text
  • Juniorprofessorenprogramm MWK Baden-Württemberg: Deep semantic models for high-end NLP applications
  • Elite Post-docs program of the the Baden-Württemberg Stiftung: Knowledge consolidation and organization for query-specific Wikipedia construction
  • RiSC Programm MWK Baden-Württemberg: Vision and language understanding beyond literal meaning
  • MWFK BaWü Project: Part-Time Master Program: Data Science
  • Research and Science Center: Trust in Web Reviews
  • Research Data Service Center


Conference Item

  • Nicole Rae Berg, Will Lowe, Simone Paolo Ponzetto, Heiner Stuckenschmidt and Cäcilia Zirn Estimating Central Bank Preferences. In: Paper entry to the NLP Unshared Task in PoliInformatics, June 2014; 1-6. PoliInformatics Research Coordination Network (PInet), [Washington, DC], 2014.
  • Orphée De Clercq, Sven Hertling, Veronique Hoste, Simone Paolo Ponzetto and Heiko Paulheim Identifying Disputed Topics in the News. In: CEUR Workshop ProceedingsLD4KG 2014 : Proceedings of the 1st Workshop on Linked Data for Knowledge Discovery co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2014); Nancy, France, Sept. 19th, 2014; Paper 4. RWTH, Aachen, 2014.
  • Orphée De Clercq, Michael Schuhmacher, Simone Paolo Ponzetto and Veronique Hoste Exploiting frameNet for content-based book recommendation. In: CEUR workshop proceedingsCBRecSys 2014 New Trends in Content-based Recommender Systems : Proceedings of the 1st Workshop on New Trends in Content-based Recommender Systems co-located with the 8th ACM Conference on Recommender Systems (RecSys 2014); 14-21. RWTH, Aachen, 2014.
  • Laura Dietz, Michael Schuhmacher and Simone Paolo Ponzetto Queripidia: Query-specific Wikipedia Construction. In: AKBC 2014 4th Workshop on Automated Knowledge Base Construction (AKBC) 2014 at NIPS 2014 in Montreal, Canada, December 13, 2014; 1-5. AKBC, Montréal, 2014.
  • Arnab Dutta, Christian Meilicke and Simone Paolo Ponzetto A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources. In: Lecture Notes in Computer ScienceThe Semantic Web: Semantics and Big Data : 11th International Conference, ESWC 2014, Anissaras, Crete, Greece, May 25-29, 2014, Proceedings ; 286-301. Springer International Publ., Cham, 2014.
  • Arnab Dutta and Michael Schuhmacher Entity Linking for Open Information Extraction. In: Lecture Notes in Computer ScienceNatural Language Processing and Information Systems : 19th International Conference on Applications of Natural Language to Information Systems, NLDB 2014, Montpellier, France, June 18-20, 2014. Proceedings; 75-80. Springer Internat. Publ., Cham, 2014.
  • Dominique Ritze, Cäcilia Zirn, Colin Greenstreet, Kai Eckert and Simone Paolo Ponzetto Named Entities in Court: The MarineLives Corpus. In: Language Resources and Technologies for Processing and Linking Historical Documents and Archives - Deploying Linked Open Data in Cultural Heritage Workshop : associated with the LREC 2014 Conference, 26 - 30 May 2014, Reykjavik; 26-30. LREC, Reykjavik, 2014.
  • Michael Schuhmacher and Christian Meilicke Popular Books and Linked Data: Some Results for the ESWC’14 RecSys Challenge. In: Communications in computer and information science Semantic Web Evaluation Challenge : SemWebEval 2014 at ESWC 2014, Anissaras, Crete, Greece, May 25-29, 2014, Revised Selected Papers; 176-181. Springer, Berlin [u.a.], 2014.
  • Michael Schuhmacher and Simone Paolo Ponzetto Knowledge-based graph document modeling. In: WSDM '14 : Proceedings of the 7th ACM International Conference on Web Search and Data Mining ; 543-552. ACM, New York, NY, 2014.
  • Michael Schuhmacher and Simone Paolo Ponzetto Ranking Entities in a Large Semantic Network. In: Lecture Notes in Computer ScienceThe Semantic Web: ESWC 2014 Satellite Events : ESWC 2014 Satellite Events, Anissaras, Crete, Greece, May 25-29, 2014, Revised Selected Papers; 254-258. Springer, Cham, 2014.
  • Lydia Weiland, Wolfgang Effelsberg and Simone Paolo Ponzetto Weakly supervised construction of a repository of iconic images. In: The 3rd Annual Meeting Of The EPSRC Network On Vision & Language and The 1st Technical Meeting of the European Network on Integrating Vision and Language : A Workshop of the 25th International Conference on Computational Linguistics ; Proceedings; 68-73. Asooc. for Computational Linguistics, Stroudsburg, Pa., 2014.
  • Lydia Weiland, Felix Hanser and Ansgar Scherp Requirements Elicitation Towards a Search Engine for Semantic Multimedia Content. In: 2014 IEEE International Conference on Semantic Computing ; Proceedings ; 16–18 June 2014, Newport Beach, California ; 116-119. IEEE Computer Soc., Los Alamitos, Calif. [u.a.], 2014.
  • Lydia Weiland and Ansgar Scherp A Novel Approach for Semantics-Enabled Search of Multimedia Documents on the Web. In: Lecture Notes in Computer ScienceMultiMedia Modeling : 20th Anniversary International Conference, MMM 2014, Dublin, Ireland, January 6-10, 2014, Proceedings, Part I; 50-61. Springer Internat. Publ., Cham, 2014.
  • Cäcilia Zirn, Michael Schäfer, Michael Strube, Simone Paolo Ponzetto and Heiner Stuckenschmidt Exploring structural features for position analysis in political discussions. In: Paper entry to the 2014 NLP Unshared Task in PoliInformatics, June 2014; 1-5. PoliInformatics Research Coordination Network (PInet), [Washington, DC], 2014.

Master and Bachelor Theses

This thesis should provide an in-depth overview of the various recurrent neural network models (fully recurrent networks, recursive networks, long...


This thesis should provide an in-depth overview of the state-of-the-art methods for representing knowledge graphs and knowledge bases in the (i.e.,...


Social network are of high interests, for many applications ranging from simple user profiling to user customized advertisement. In this thesis, we...


Continuous emotions detection is a core aspect for many real application. In this work we will experiment with an existing interactive installation...


The goal of this thesis would be to organize news from German news outlets in such a way to detect events and salient topics in the news. The...


Convolutional neural networks have been shown to be very successful to various text classification tasks. The main shortcoming of CNNs used for text...


Recently the DWS group released a huge repository of hypernymy relations the Web, the WebIsADb (, containing a large...


In this thesis we will build upon and extend an annotation tool to conduct a user study and better understand the requirements towards image...


Object detection in images from news articles is a very challenging task. On the one hand, available training data for object detectors is only...


Introduction/problem: Speculation/hedging/vagueness identification plays significant role in many applications, e.g. information extraction, machine...