Master Thesis: Cross-lingual Topic-Based Manifesto Classification (Nanni/Ponzetto)

Target: Master

Type: Experiments

Introduction/problem:  Party manifestos (https://manifestoproject.wzb.eu/) present the vision of a specific party over different topics. Manifestos have been labeled at topic level in english documents (e.g. US and Uk parties manifestos), but the same annotations are not always available in other languages.

Goal: Classify manifestos in other languages in topics, using the English manifestos as training set.

Approach:  Goal of this thesis is to train a cross-lingual classifier that learn topics in English and detect topics in for example German manifestos using the embeddings and translation matrices. Evaluation will be on the few manifestos in other languages that have been labeled with topics. Train the classifier on English documents (but with embedding as features) and then you can use that classifier for other language texts (by first using the translation matrix to map these texts to English embedding space).

Requirements: Skilled programmer.