CS 709 Queripidia: Seminar on Wikipedia Construction (HWS 2015)

This is a course about Wikipedia construction: we present a variety of topics that can be selected from the students for a Seminar or Master thesis. 

Wikipedia construction is the task of, given a query topic, automatically retrieve, extract, and compose information into an encyclopedic article.


  • This seminar is organized by Prof. Dr. Simone Ponzetto and Dr. Laura Dietz
  • Available for master students working on a Seminar Thesis or Master Thesis
In this seminar, you will

  • Read, understand, and explore the scientific literature of a topic of active research
  • Highlight aspects of the literature useful with respect to utilization for Wikipedia construction
  • Discuss with other participants the relations of your topic and other topics in the seminar


  • We will provide guidance and one-on-one meetings
  • Work individually throughout the semester: explore literature and write your thesis
  • Give presentations in the seminar about your literature survey and, for Master Theses, the progress of your research.


The papers and articles listed below serve as an entry point to a topic; you are expected to explore related relevant literature on your own.

General paper

  • Siddhartha Banerjee and Prasenjit Mitra. Wikikreator: Improving wikipedia stubs automatically. In Proc. of ACL-AFNLP-2015, 2015.

Summarization from passages with Web data

Keywords: Multi-Document summarization; Query-specific summarization; Entity Summarization

  • Makoto P Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, and Mayu Iwata. Overview of the ntcir-11 mobileclick task. In Proc. of NTCIR-11, 2014.
  • Ani Nenkova and Kathleen McKeown. A survey of text summarization techniques. In Mining Text Data, pages 4376. Springer, 2012.

Discourse, event chains and narratives for generating summaries

Keywords: Discourse, Narratives, Composing Text

  • Nathanael Chambers and Dan Jurafsky. Unsupervised learning of narrative schemas and their participants. In Proc. of ACL-09, pages 602610, 2009.
  • Jason M. Zalinger. Gmail as storyworld: How technology shapes your life narrative. PhD thesis, Rensselaer Polytechnic Institute, 2011.
  • Besnik Fetahu, Katja Markert, and Avishek Anand. Automated news suggestions for populating wikipedia entity pages. In Proc. of CIKM-15, 2015.
  • Javed Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Richard McCreadie, Virgil Pavlu, and Tetsuya Sakai. Trec 2014 temporal summarization track overview. In Proc. of TREC-2015, 2015.

Fluency, grammaticality

Keywords: readability, grammatical error correction

  • Christina Sauper and Regina Barzilay. Automatically generating Wikipedia articles: A structure-aware approach. In Proc. of ACL-IJCNLP-09, pages 208216, 2009.
  • Hwee Tou Ng, Siew Mei Wu, Yuanbin Wu, Christian Hadiwinoto and Joel Tetreault. (CoNLL 2013): The CoNLL-2013 Shared Task on Grammatical Error Correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning. Sofia, Bulgaria. August 08-09, 2013

Connections between entities from text and knowledge base

  • Roi Blanco and Hugo Zaragoza. Finding support sentences for entities. In Proc. of SIGIR-10, pages 339346, 2010.
  • Nikos Voskarides, Edgar Meij, Manos Tsagkias, Maarten de Rijke, and Wouter Weerkamp. Learning to explain entity relationships in knowledge graphs. In Proc. of ACL-IJCNLP 2015, 2015.

Opinionated aspects and controversial opinions about entities

  • Dori-Hacohen, Shiri, and James Allan. "Detecting controversy on the web." Proc. of CIKM-13.
  • Awadallah, Rawia, Maya Ramanath, and Gerhard Weikum. "Harmony and dissonance: organizing the people's voices on political controversies." Proc. of WSDM-12.

Hierarchical topic detection / (Sub-)Heading prediction

Keywords: Clustering topic models / Newsgroup classification / Heading Prediction

  • Radityo Eko Prasojo, Mouna Kacimi, and Werner Nutt. Entity and aspect extraction for organizing news comments. In Proc. of CIKM-15, 2015.
  • Ridho Reinanda, Edgar Meij, and Maarten de Rijke. Mining, ranking and recommending entity aspects. In Proc. of SIGIR-15, 2015.
  • Li, Wei, and Andrew McCallum. "Pachinko allocation: DAG-structured mixture models of topic correlations." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
  • Tsoumakas, Grigorios, and Ioannis Katakis. "Multi-label classification: An overview." Dept. of Informatics, Aristotle University of Thessaloniki, Greece (2006).

Information retrieval of Entities, Relations, Types, Categories

Keywords: Object retrieval: Entity Retrieval, Classification, Passage Retrieval

  • Jerey Dalton, Laura Dietz, and James Allan. Entity query feature expansion using knowledge base links. In Proc. of SIGIR-14, pages 365374, 2014.
  • Rianne Kaptein and Jaap Kamps. Exploiting the category structure of wikipedia for entity ranking. Artificial Intelligence, 194:111129, 2013.
  • Michael Schuhmacher, Laura Dietz, and Simone Ponzetto. Ranking entities for web queries through text and knowledge. In Proc. of CIKM-15, 2015.
  • Chenyan Xiong and Jamie Callan. Esdrank: Connecting query and documents through external semi- structured data. In Proc. of CIKM-15, 2015.