Focus Group: Natural Language Processing and Information Retrieval (Prof. Ponzetto and Prof. Glavaš)

The NLP and IR group at DWS conducts research on methods for knowledge acquisition and natural language processing (NLP), as well as their application to support empirical research in (Computational) Social Sciences and (Digital) Humanities. In our work, we investigate a wide range of techniques for text understanding - ranging from representation learning and distributional semantics all the way through symbolic, entity-based approaches leveraging wide-coverage knowledge graphs - and apply these to a wide range of research topics such as such computational semantics, multilinguality, information retrieval and multimodal NLP, to name a few.

People

Faculty:

Staff:

Alumni:

 *  joint project with the AI group

Projects

Completed

Publications

Conference Item

  • Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto and Heiko Paulheim Biased graph walks for RDF graph embeddings. In: WIMS '17 Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics : Amantea, Italy, June 19 - 22, 2017; Article 21. ACM, New York, NY, 2017.
  • Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto and Heiko Paulheim Global RDF vector space embeddings. In: Lecture Notes in Computer ScienceThe Semantic Web – ISWC 2017 : 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017, proceedings, part I; 190-207. Springer, Cham, 2017.
  • Alexander Diete, Timo Sztyler, Lydia Weiland and Heiner Stuckenschmidt Recognizing grabbing actions from inertial and video sensor data in a warehouse scenario. In: Procedia computer science14th International Conference on Mobile Systems and Pervasive Computing (MobiSPC 2017) / 12th International Conference on Future Networks and Communications (FNC 2017) / Affiliated Workshops; 16-23. Elsevier, Amsterdam [u.a.], 2017.
  • Stefano Faralli, Alexander Panchenko, Chris Biemann and Simone Paolo Ponzetto The ContrastMedium algorithm : taxonomy induction from noisy knowledge graphs with just a few links. In: 15th Conference of the European Chapter of the Association for Computational Linguistics : proceedings of conference, volume 1: Long Papers; 590-600. Association for Computational Linguistics, Stroudsburg, PA, 2017.
  • Goran Glavaš, Federico Nanni and Simone Paolo Ponzetto Cross-lingual classification of topics in political texts. In: The 55th Annual Meeting of the Association for Computational Linguistics - proceedings of the Second Workshop on Natural Language Processing and Computational Social Science : August 3, 2017, Vancouver, Canada : ACL 2017; 42-46. Association for Computational Linguistics (ACL), Stroudsburg, PA, 2017.
  • Goran Glavaš, Federico Nanni and Simone Paolo Ponzetto Unsupervised cross-lingual scaling of political texts. In: 15th Conference of the European Chapter of the Association for Computational Linguistics : proceedings of conference, volume 1: Long Papers; 688-693. Association for Computational Linguistics, Stroudsburg, PA, 2017.
  • Goran Glavaš and Simone Paolo Ponzetto Dual tensor model for detecting asymmetric lexico-semantic relations. In: EMNLP 2017 : Conference on Empirical Methods in Natural Language Processing : conference proceedings : Copenhagen, Denmark, September 7-11, 2017; 1757-1767. Association for Computational Linguistics, Stroudsburg, PA, 2017.
  • Patrick Klein, Simone Paolo Ponzetto and Goran Glavaš Improving neural knowledge base completion with cross-lingual projections. In: 15th Conference of the European Chapter of the Association for Computational Linguistics : proceedings of conference, volume 1: Long Papers; 516-522. Association for Computational Linguistics, Stroudsburg, PA, 2017.
  • Stephan Kopf, Mariia Zrianina, Benjamin Guthier, Lydia Weiland, Philipp Schaber, Simone Paolo Ponzetto and Wolfgang Effelsberg Enhancing bag of visual words with color information for iconic image classification. In: IPCV 2016 : proceedings of the 2016 International Conference on Image Processing, Computer Vision, & Pattern Recognition : WORLDCOMP '16, July 25-28, 2016, Las Vegas, Nevada, USA; 206-209. CSREA Press, Athens, GA, 2017.
  • Anne Lauscher, Goran Glavaš, Simone Paolo Ponzetto and Kai Eckert Investigating convolutional networks and domain-specific embeddings for semantic classification of citations. In: WOSP 2017 : 6th International Workshop on Mining Scientific Publications : Toronto, ON, Canada, June 19, 2017; 24-28. ACM, New York, NY, 2017.
  • Anne Lauscher, Federico Nanni and Simone Paolo Ponzetto Entitäten als Topic Labels : Verbesserung der Interpretierbarkeit und Evaluierbarkeit von Themen durch Kombinieren von Entity Linking und Topic Modeling. In: DHd 2017 : Digitale Nachhaltigkeit : Konferenzabstracts : Universität Bern 13. bis 18. Februar 2017 : 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V.; 242-244. Verband Digital Humanities im deutschsprachigen Raum e.V., Bern, 2017.
  • Anne Lauscher, Federico Nanni and Simone Paolo Ponzetto SLaTE: a system for labeling topics with entities. In: Digital Humanities 2017 : conference abstracts : McGill University & Université de Montréal, Montréal, Canada, August 8 – 11, 2017; 742-743. McGill Université ; Université de Montréal, Montréal, 2017.
  • Stefano Menini, Federico Nanni, Simone Paolo Ponzetto and Sara Tonelli Topic-based agreement and disagreement in US electoral manifestos. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing; 2928-2934. Association for Computational Linguistics, Stroudsburg, PA, 2017.
  • Federico Nanni, Bhaskar Mitra, Matt Magnusson and Laura Dietz Benchmark for complex answer retrieval. In: ICTIR '17 : proceedings of the 3rd ACM International Conference on the Theory of Information Retrieval : October 1-4 2017, Amsterdam, Netherlands; 293-296. ACM, New York, NY, 2017.
  • Federico Nanni and Giulia Paci A discipline-enriched dataset for tracking the computational turn of European universities. In: WOSP 2017 : 6th International Workshop on Mining Scientific Publications : Toronto, ON, Canada, June 19, 2017; 29-33. ACM, New York, NY, 2017.
  • Federico Nanni, Simone Paolo Ponzetto and Laura Dietz Building entity-centric event collections. In: 2017 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017, Toronto, ON, Canada, June 19-23, 2017; 199-208. IEEE Computer Soc., Piscataway, NJ, 2017.
  • Sergiu Nisioi, Sanja Štajner, Simone Paolo Ponzetto and Liviu P. Dinu Exploring neural text simplification models. In: The 55th Annual Meeting of the Association for Computational Linguistics - proceedings of the conference : July 30-August 4, 2017, Vancouver, Canada : ACL 2017; 85-91. Association for Computational Linguistics, Stroudsburg, PA, 2017.
  • Alexander Panchenko, Stefano Faralli, Simone Paolo Ponzetto and Chris Biemann Using linked disambiguated distributional networks for word sense disambiguation. In: EACL 2017 Workshop on Sense, Concept and Entity Representations and their Applications : proceedings of the workshop, April 4, 2017 Valencia, Spain; 72-78. Association for Computational Linguistics, Stroudsburg, PA, 2017.
  • Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, Dmitry Ustalov, Simone Paolo Ponzetto and Chris Biemann Unsupervised, knowledge-free, and interpretable word sense disambiguation. In: The Conference on Empirical Methods in Natural Language Processing - proceedings of System Demonstrations : September 9-11, 2017, Copenhagen, Denmark : EMNLP 2017; 91-96. Association for Computational Linguistics, Stroudsburg, PA, 2017.
  • Alexander Panchenko, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto and Chris Biemann Unsupervised does not mean uninterpretable : the case for word sense induction and disambiguation. In: 15th Conference of the European Chapter of the Association for Computational Linguistics : proceedings of conference, volume 1: Long Papers; 86-98. Association for Computational Linguistics, Stroudsburg, PA, 2017.
  • Petar Ristoski, Stefano Faralli, Simone Paolo Ponzetto and Heiko Paulheim Large-scale taxonomy induction using entity and word embeddings. In: WI 2017 : proceedings of the International Conference on Web Intelligence, Leipzig, Germany, August 23-26, 2017; 81-87. ACM, New York, NY, 2017.
  • Marco Rovera, Federico Nanni, Simone Paolo Ponzetto and Anna Goy Domain-specific named entity disambiguation in historical memoirs. In: CEUR Workshop ProceedingsCLiC-it 2017 : proceedings of the Fourth Italian Conference on Computational Linguistics, Rome, Italy, December 11-13, 2017; Paper 20. RWTH, Aachen, 2017.
  • Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto and Laura Dietz Using object detection, NLP, and knowledge bases to understand the message of images. In: Lecture Notes in Computer ScienceMultiMedia Modeling : 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part II; 405-418. Springer International Publishing, Cham, 2017.
  • Sanja Štajner, Mark Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso and Heiner Stuckenschmidt Sentence alignment methods for improving text simplification systems. In: The 55th Annual Meeting of the Association for Computational Linguistics - proceedings of the conference : July 30-August 4, 2017, Vancouver, Canada : ACL 2017; 97-102. Association for Computational Linguistics, Stroudsburg, PA, 2017.
  • Sanja Štajner, Goran Glavaš, Simone Paolo Ponzetto and Heiner Stuckenschmidt Domain adaptation for automatic detection of speculative sentences. In: IEEE 11th International Conference on Semantic Computing ICSC 2017 : 30 January - 1 February 2017, San Diego, California : proceedings; 164-171. IEEE Computer Society, Los Alamitos, CA [u.a.], 2017.
  • Sanja Štajner, Simone Paolo Ponzetto and Heiner Stuckenschmidt Automatic assessment of absolute sentence complexity. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI 2017) : Melbourne, Australia 19-25 August 2017; 4096-4102. International Joint Conferences on Artificial Intelligence, Melbourne, 2017.

Article

Master and Bachelor Theses

This thesis should provide an in-depth overview of the various recurrent neural network models (fully recurrent networks, recursive networks, long...

more

This thesis should provide an in-depth overview of the state-of-the-art methods for representing knowledge graphs and knowledge bases in the (i.e.,...

more

Social network are of high interests, for many applications ranging from simple user profiling to user customized advertisement. In this thesis, we...

more

Continuous emotions detection is a core aspect for many real application. In this work we will experiment with an existing interactive installation...

more

The goal of this thesis would be to organize news from German news outlets in such a way to detect events and salient topics in the news. The...

more

Convolutional neural networks have been shown to be very successful to various text classification tasks. The main shortcoming of CNNs used for text...

more

Recently the DWS group released a huge repository of hypernymy relations the Web, the WebIsADb (http://webdatacommons.org/isadb/), containing a large...

more

In this thesis we will build upon and extend an annotation tool to conduct a user study and better understand the requirements towards image...

more

Object detection in images from news articles is a very challenging task. On the one hand, available training data for object detectors is only...

more

Introduction/problem: Speculation/hedging/vagueness identification plays significant role in many applications, e.g. information extraction, machine...

more