CS 709 Text Analytics Seminar (HWS 2017: Vision and Language)

Recently, much work in Natural Language Processing and Computer Vision looked at problems and tasks which requires to develop new approaches to content understanding and generation in both modalities - prime examples include integrated models of language and vision, as well as exploiting data and approaches from each modality in turn to help the other one. In this seminar, we will accordingly focus on a recent body of research at the intersection of vision and language computing, and aim at understanding current trends, methods, etc.

Presentations schedule and additional information

The following is the schedule for the seminar presentations (each presentation should be at most 25 minutes long, followed by 20 minutes of discussion):

Monday, November 6

15:30 Nils Wilken: Black Holes and White Rabbits: Metaphor Identification with Visual Features

16:15 Jan Portisch: Visually Grounded Meaning Representations

Thursday, November 9

15:30 Kristiana Ristani: Multi-Task Video Captioning with Video and Entailment Generation

16:15 Nancy Kunath: Generative Adversarial Text to Image Synthesis

Monday, November 13

13:30 Ersejda Demirxhiu: Incorporating Global Visual Features into Attention-Based Neural Machine Translation

14:15 Yide Song: Multimodal distributional semantics

Monday, November 20

15:30 Hadeel Behiry: Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering.

16:15 Hasan Al Abed: Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Monday, November 27

15:00 Flavia Kristo: Learning Visually Grounded Sentence Representations

15:45 Lu Lifei: From red wine to red tomato: composition with context

16:30 Zonqi Chen: Let Your Photos Talk: Generating Narrative Paragraph for Photo Stream via Bidirectional Attention Recurrent Neural Networks

Thursday, November 30

15:30 Mehmet Nur: Grounding of Textual Phrases in Images by Reconstruction

16:15 Zihe Cheng: Deep visual-semantic alignments for generating image descriptions

---

Students are obliged to attend all the seminar talks and encouraged to participate in discussions following the presentations.  

Seminar papers need to be submitted by Friday, December 22 (23:59), via ILIAS. 

Your seminar grade will be combined from our evaluation of your presentation (45%), your submitted paper (45%) and preparation and participation in discussions for other talks (10%).  

__________________________________________________________________________________________________

Seminar kick-off meeting

The kick-off meeting for the Seminar in Text Analytics (topic "Vision and Language") will be held on 7.9 at 2pm in our seminar room C1.01 in part C (first floor) of the B6 building.

_______________________________________________________________________________________________

Organization

Goals

In this seminar, you will

  • Read, understand, and explore scientific literature
  • Summarize a current research topic in a concise report
  • Give a focused presentation about a scientific publication

Requirements

  • The report has to be written with Latex (Beginners are welcome)
  • Report and presentation have to be in English

Schedule

  • Attend the kickoff meeting on 7.9.2017
  • Select your preferred paper and topic by 15.9.2017
  • Work on your presentation and report during the rest of the semester (details to be presented in the kick-off)

Registration

Explore the list of topics below and select at least 3 topics of your preference. Send a ranked list of your selected topics via email to Goran Glavaš until September 6, 2017. We will confirm your registration as soon as possible. The actual topic assignment takes place at the kickoff meeting; our goal is, of course, to assign to you one of your preferred topics.

Topics

The topics we offer are listed below. Next to each topic you'll find references to serve as an entry point for your literature research. 

The papers and articles listed below serve as an entry point to a topic; you are expected to explore related relevant literature on your own.

  • Grounding & Multimodal Semantics
  1. Multimodal distributional semantics: E. Bruni, N. Tran, and M. Baroni, JAIR 2014.
  2. Visually Grounded Meaning Representations: C. Silberer, V. Ferrari, and M. Lapata, PAMI 2016.
  3. Learning Visually Grounded Sentence Representations: D. Kiela, A. Conneau, A. Jabri, M. Nickel, 2017.
  4. Grounding of Textual Phrases in Images by Reconstruction: Rohrbach, A.; Rohrbach, M.; Hu, R.; Darrell, T.; Schiele, B., ECCV 2016.
  • Captioning (Image to Text)
  1. Deep visual-semantic alignments for generating image descriptions. A. Karpathy and L. Fei-Fei, CVPR. 2015.
  2. Multi-Task Video Captioning with Video and Entailment Generation. R. Pasunuru and M. Bansal. ACL 2017.
  3. Let Your Photos Talk: Generating Narrative Paragraph for Photo Stream via Bidirectional Attention Recurrent Neural Networks. Y. Liu, J. Fu, T. Mei and C. Wen Chen. AAAI 2017.
  • Image Generation (Text to Image) and Analysis
  1. Generative Adversarial Text to Image Synthesis: S. Reed, Z. Akata, X Yan, L. Logeswara, B. Schiele, H. Lee, ICML 2016
  2. From Red Wine to Red Tomato: Composition With Context:  Ishan Misra, Abhinav Gupta, Martial Hebert, CVPR 2017
  • Visual Question Answering
  1. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. Xu, H., & Saenko, K. ECCV 2016.
  2. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images: M. Malinowski, M. Rohrbach, and M. Fritz. ICCV 2015
  • Multimodal NLP
  1. Incorporating Global Visual Features into Attention-Based Neural Machine Translation: Calixto, I., Liu, Q., & Campbell, N. EMNLP 2017
  2. Black Holes and White Rabbits: Metaphor Identification with Visual Features: E. Shutova, D. Kiela, J. Maillard, NAACL 2016.