News - NLP and IR

Focus Group: Natural Language Processing and Information Retrieval (Prof. Ponzetto and Prof. Glavaš)

The NLP and IR group at DWS conducts research on integrating knowledge from heterogeneous Web sources – ranging from large raw text collections all the way through collaboratively constructed resources (e.g., Wikipedia) and knowledge bases (DBpedia, Freebase, etc.) – and its application to Natural Language Processing (NLP), Information Analysis and Retrieval tasks. Areas of interest include “deep” NLP techniques for text understanding, ranging from lexical and computational semantics (Word Sense Disambiguation, ontology-based and distributional meaning representations), over information extraction (entities and events), to document understanding and structuring (entity linking, ranking and search, automatic summarization). The group also applies NLP methods to support empirical research in Social Science and Humanities.

People

Faculty:

Staff:

Alumni:

 *  joint project with the AI group

**  joint work with the Web-based Information Systems and Services @ HDM Stuttgart

Projects

Publications

Conference Item

  • Alexander Diete, Timo Sztyler, Lydia Weiland and Heiner Stuckenschmidt Improving motion-based activity recognition with ego-centric vision. In: 2018 IEEE International Conference on Pervasive Computing and Communications : PerCom 2018, Athens, Greece, March 19-23, 2018 : PerCom Workshops proceedings; tba. IEEE Computer Society, Piscataway, NJ, 2018.
  • Federico Nanni, Goran Glavaš, Simone Paolo Ponzetto, Sara Tonelli, Nicolò Conti, Ahmet Aker, Alessio Palmero Aprosio, Arnim Bleier, Benedetta Carlotti, Theresa Gessler, Tim Henrichsen, Dirk Hovy, Christian Kahmann, Mladen Karan, Akitaka Matsuo, Stefano Menini and Don Nguyen Findings from the hackathon on understanding euroscepticism through the lens of textual data. In: Proceedings of the LREC 2018 Workshop ParlaCLARIN : Miyazaki, Japan, Monday 7th May 2018; 1-8. LREC, Miyazaki, Japan, 2018.
  • Federico Nanni, Mahmoud Osman, Yi-Ru Cheng, Simone Paolo Ponzetto and Laura Dietz UKParl: A Data Set for Topic Detection with Semantically Annotated Text. In: Proceedings of the LREC 2018 Workshop ParlaCLARIN : Miyazaki, Japan, Monday 7th May 2018; 1-4. LREC, Miyazaki, Japan, 2018.
  • Federico Nanni, Simone Paolo Ponzetto and Laura Dietz Entity-aspect linking : providing fine-grained semantics of entities in context. In: Joint Conference on Digital Libraries 2018 : June 3-6, 2018 in Fort Worth, Texas : Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL); 1-10. ACM, New York, NY, 2018.
  • Alexander Panchenko, Dmitry Ustalov, Stefano Faralli, Simone Paolo Ponzetto and Chris Biemann Improving hypernymy extraction with distributional semantic classes. In: LREC 2018, 11th International Conference on Language Resources and Evaluation : 7-12 May 2018, Miyazaki (Japan); 1541-1551. European Language Resources Association, ELRA-ELDA, Paris, 2018.
  • Christoph Kilian Theil, Sanja Štajner and Heiner Stuckenschmidt Word embeddings-based uncertainty detection in financial disclosures. In: ACL 2018 Workshop on Economics and Natural Language Processing (ECONLP); tba. Association for Computational Linguistics, Stroudsburg, PA, 2018.
  • Christoph Kilian Theil, Sanja Štajner, Heiner Stuckenschmidt and Simone Paolo Ponzetto Automatic detection of uncertain statements in the financial domain. In: Lecture notes in computer science ; 10761 + 10762Computational Linguistics and Intelligent Text Processing : 18th International Conference, CICLing 2017, Budapest, Hungary, April 17–23, 2017, Revised Selected Papers, Part I / II; tba. Springer International Publishing, Cham, 2018.
  • Dmitry Ustalov, Alexander Panchenko, Andrei Kutuzov, Chris Biemann and Simone Paolo Ponzetto Unsupervised semantic frame induction using triclustering. In: The 56th Annual Meeting of the Association for Computational Linguistics : ACL 2018 : proceedings of the conference, vol. 2 (short papers) : July 15 - 20, 2018 Melbourne, Australia; 55-62. Association for Computational Linguistics, Stroudsburg, PA, 2018.
  • Dmitry Ustalov, Denis Teslenko, Alexander Panchenko, Mikhail Chernoskutov, Chris Biemann and Simone Paolo Ponzetto An unsupervised word sense disambiguation system for under-resourced languages. In: LREC 2018, 11th International Conference on Language Resources and Evaluation : 7-12 May 2018, Miyazaki (Japan); 1018-1022. European Language Resources Association, ELRA-ELDA, Paris, 2018.

Master and Bachelor Theses

The "ius commune" or "learned laws" (= "roman and canon law” of the Middle Ages) are full of citations which follow a set of generally common rules...

more

The Web offers a goldmine of information describing a multitude of companies whose products and services can be potentially matched against Web users’...

more

Recently, there has been much interest to exploit Web-scale resource like the CommonCrawl for intelligent text processing and information extraction...

more

Recently, we started investigating methods and framework to automatically extract high-quality hypernym relations from Web-scale amounts of data,...

more

Entity linking, the task of linking mentions of entities in text to wide-coverage concept repositories like DBPedia or Freebase, has so far...

more