Computer-based Content Analysis – Text HWS 2017

 

Title

Computer-based Content Analysis

Type of course

Lectures and practical exercises

Level

Ph.D. (Social Sciences)

ECTS points

5

Language

English

Time and place of lectures

Fri 08.04., 22.04., 29.04., 20.05.

SoWi-PC Pool A5,6, C-108

Lecturer

Sanja Stajner and Federico Nanni

 

 

Preliminaries:

  • Foundations of linear algebra and probability theory (high school level)
  • Computer skills that allow to get familiar with complex applications fast

Grading is based on:

  • Implementation of a project
  • Final presentation
  • Report (~ 15 pages)

Attendance Modalities (new!):

  • Lecture part: attendance *voluntary*
  • Project presentation part: attendance *mandatory*

 

Content of the Lecture

The course presents methods for the computer assisted automatic analysis of digital documents as a basis for further quantitative content analyses used in social and cultural sciences.

In the beginning we will present some possible analyses computational linguistics can offer to social and cultural sciences using the software GATE. This is followed by a short programming course in the Python programming language introducing a more flexible way of preprocessing texts (using NLTK) and also access to text data through web crawling and conversion of different file formats. We will also show some advanced methods in text classification and clustering and several possible tools to achieve that. In the second part of the course participants will present their own project work.

Dates and Topics

Date

Topic

Material (PDF)

Exercises (PDF)

Introduction

08.04. 

Overview & Goals (CZ)

Introduction to Named Entity Recognition & GATE

Introduction &

NER/Gate

08.04. 

Regular Expressions  (CZ)

Regular Expressions

Assignment 1
Text for assignment 1

Programming with Python

22.04. 

Introduction to Python (SS)

Python Intro

Free interactive Python lessons

(Try to complete lessons 1-4)

22.04.

Introduction to Python II (SS)

Python Intro II

22.04. 

Text preprocessing with NLTK (SS)

NLTK

Assignment CodeSnippet  Webpage
Machine Learning

29.04. 

Crawling Websites & Document Conversion (CZ)

Crawling

Assignment
a) crawler
b) converter_sceleton

29.04. 


Machine Learning: Text Classification and Clustering (SS)

Weka

Machine Learning

29.04. 

Project Assignments

Project Work

 

Project Time without Lectures

 

 

Project Time without Lectures

 ☕

20.05. 

Presentations

Reading recommendations

Description

Title

Application of NLP methods (NLTK)

Social Media Mining of the Icelandic Blogosphere

Application of NLP methods

Automated Discovery and Analysis of Social Networks from Threaded Discussions

Exercises

We will hand out (non-mandatory) exercises that will help you understand the presented technology and methods. We strongly suggest that you take the time to work on them. In our experience hands-on exercises make it a lot easier to follow a course like this.