CS 709 Text Analytics Seminar (FSS 2018: Statistical Machine Translation)

In this seminar we will look at major milestones in the development of Statistical Machine Translation (SMT) systems from the past 15 years. Covered topics will include phrase-based and neural MT, evaluation and applications of MT models.

Seminar kick-off meeting

The kick-off meeting for the Seminar in Text Analytics (topic "Statistical Machine Translation") will be held on 15.2 at 15:30 in our seminar room C1.01 in part C (first floor) of the B6 building.




Registration starts on 31.1 (23:59) and ends on 7.2 (23:59) - per email  to Sanja Stajner (NOTE: emails sent before that do not count as registration!).

Explore the list of topics (online by 31.1) and select at least 3 papers of your preference. Send a ranked list of your selected papers via email to Sanja Stajner until February 7. We will confirm your registration as soon as possible. The actual topic assignment takes place at the kickoff meeting; our goal is, of course, to assign to you papers based on your preferences.


  • Select your preferred paper and topic by 07.02
  • Attend the kickoff meeting on 15.02
  • Attend all meetings: 08.03, 15.03, 22.03, 12.04, 19.04, 26.04 (14:00-15:30 in C101)
  • Work on your presentation and report during the rest of the semester (details to be presented in the kick-off)
  • Submit your report by 16.06


Topics by dates

08.03: E2+E3; E3+E6

15.03: E1+E5; E4+E5

22.03: S3+S4

12.04: S1+S2; N1

19.04: N2; N3

26.04: N4; N5


In this seminar, you will

  • Read, understand, and explore scientific literature
  • Summarize a current research topic in a concise report
  • Give a focused presentation about a scientific publication


  • Previous attendance of Text Analytics is recommended.
  • Basic maths (algebra and probability theory) are a must.
  • Report and presentation have to be in English.


Explore the list of topics and select at least 3 papers of your preference. Send a ranked list of your selected papers via email to Sanja Stajner until February 7.

The papers serve as an entry point to your presentation; you are expected to explore related relevant literature on your own.


Statistical MT

  • [S2] Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alexander M. Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, Dragomir R. Radev: A Smorgasbord of Features for Statistical Machine Translation. HLT-NAACL 2004: 161-168. http://aclweb.org/anthology/N/N04/N04-1021.pdf

Neural MT

  • [N3] Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches. SSST-8 2014. https://arxiv.org/pdf/1409.1259.pdf


  • [E1] Chi-kiu Lo, Dekai Wu: MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles. ACL 2011: 220-229. http://www.aclweb.org/anthology/P11-1023