CS 709 Queripidia: Seminar on Wikipedia Construction (HWS 2015)

This is a course about Wikipedia construction: we present a variety of topics that can be selected from the students for a Seminar or Master thesis. 

Wikipedia construction is the task of, given a query topic, automatically retrieve, extract, and compose information into an encyclopedic article.


  • This seminar is organized by Prof. Dr. Simone Ponzetto and Dr. Laura Dietz
  • Available for master students working on a Seminar Thesis or Master Thesis
  • The kick-off meeting takes place on Sept 22, 13:00 at B6,26 room A2.07. - Please contact us ASAP if you would like to participate in the seminar but cannot attend the kick-off meeting.


In this seminar, you will

  • Read, understand, and explore the scientific literature of a topic of active research
  • Highlight aspects of the literature useful with respect to utilization for Wikipedia construction
  • Discuss with other participants the relations of your topic and other topics in the seminar


  • Select your preferred topics and register by email (see below)
  • Attend the kickoff meeting (to be disclosed)
  • We will provide guidance and one-on-one meetings
  • Work individually throughout the semester: explore literature and write your thesis
  • Give presentations in the seminar about your literature survey and, for Master Theses, the progress of your research.


Explore the lists of topics and register your interest in the seminar in general as well as particular topics with us by email to Laura Dietz. Final assignments will be made at the kick-off meeting. Otherwise topics will be assigned on a first come first serve basis.


The papers and articles listed below serve as an entry point to a topic; you are expected to explore related relevant literature on your own.

General paper

  • Siddhartha Banerjee and Prasenjit Mitra. Wikikreator: Improving wikipedia stubs automatically. In Proc. of ACL-AFNLP-2015, 2015.

Summarization from passages with Web data

Keywords: Multi-Document summarization; Query-specific summarization; Entity Summarization

  • Makoto P Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, and Mayu Iwata. Overview of the ntcir-11 mobileclick task. In Proc. of NTCIR-11, 2014.
  • Ani Nenkova and Kathleen McKeown. A survey of text summarization techniques. In Mining Text Data, pages 4376. Springer, 2012.

Discourse, event chains and narratives for generating summaries

Keywords: Discourse, Narratives, Composing Text

  • Nathanael Chambers and Dan Jurafsky. Unsupervised learning of narrative schemas and their participants. In Proc. of ACL-09, pages 602610, 2009.
  • Jason M. Zalinger. Gmail as storyworld: How technology shapes your life narrative. PhD thesis, Rensselaer Polytechnic Institute, 2011.
  • Besnik Fetahu, Katja Markert, and Avishek Anand. Automated news suggestions for populating wikipedia entity pages. In Proc. of CIKM-15, 2015.
  • Javed Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Richard McCreadie, Virgil Pavlu, and Tetsuya Sakai. Trec 2014 temporal summarization track overview. In Proc. of TREC-2015, 2015.

Fluency, grammaticality

Keywords: readability, grammatical error correction

  • Christina Sauper and Regina Barzilay. Automatically generating Wikipedia articles: A structure-aware approach. In Proc. of ACL-IJCNLP-09, pages 208216, 2009.
  • Hwee Tou Ng, Siew Mei Wu, Yuanbin Wu, Christian Hadiwinoto and Joel Tetreault. (CoNLL 2013): The CoNLL-2013 Shared Task on Grammatical Error Correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning. Sofia, Bulgaria. August 08-09, 2013

Connections between entities from text and knowledge base

  • Roi Blanco and Hugo Zaragoza. Finding support sentences for entities. In Proc. of SIGIR-10, pages 339346, 2010.
  • Nikos Voskarides, Edgar Meij, Manos Tsagkias, Maarten de Rijke, and Wouter Weerkamp. Learning to explain entity relationships in knowledge graphs. In Proc. of ACL-IJCNLP 2015, 2015.

Opinionated aspects and controversial opinions about entities

  • Dori-Hacohen, Shiri, and James Allan. "Detecting controversy on the web." Proc. of CIKM-13.
  • Awadallah, Rawia, Maya Ramanath, and Gerhard Weikum. "Harmony and dissonance: organizing the people's voices on political controversies." Proc. of WSDM-12.

Hierarchical topic detection / (Sub-)Heading prediction

Keywords: Clustering topic models / Newsgroup classification / Heading Prediction

  • Radityo Eko Prasojo, Mouna Kacimi, and Werner Nutt. Entity and aspect extraction for organizing news comments. In Proc. of CIKM-15, 2015.
  • Ridho Reinanda, Edgar Meij, and Maarten de Rijke. Mining, ranking and recommending entity aspects. In Proc. of SIGIR-15, 2015.
  • Li, Wei, and Andrew McCallum. "Pachinko allocation: DAG-structured mixture models of topic correlations." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
  • Tsoumakas, Grigorios, and Ioannis Katakis. "Multi-label classification: An overview." Dept. of Informatics, Aristotle University of Thessaloniki, Greece (2006).

Information retrieval of Entities, Relations, Types, Categories

Keywords: Object retrieval: Entity Retrieval, Classification, Passage Retrieval

  • Jerey Dalton, Laura Dietz, and James Allan. Entity query feature expansion using knowledge base links. In Proc. of SIGIR-14, pages 365374, 2014.
  • Rianne Kaptein and Jaap Kamps. Exploiting the category structure of wikipedia for entity ranking. Artificial Intelligence, 194:111129, 2013.
  • Michael Schuhmacher, Laura Dietz, and Simone Ponzetto. Ranking entities for web queries through text and knowledge. In Proc. of CIKM-15, 2015.
  • Chenyan Xiong and Jamie Callan. Esdrank: Connecting query and documents through external semi- structured data. In Proc. of CIKM-15, 2015.