Focus Group: Web Data Mining (Prof. Paulheim)

The Web Data Mining group targets two main topics. First, we look on using structured and semi-structured web data as background knowledge in data mining problems. We develop methods for efficiently accessing such web data in data mining, and mining algorithms tailored to the particularities of such data. Second, we use data mining methods to create and improve large-scale web corpora. Here, we look into machine learning methods for completing missing knowledge, as well as methods for identifying wrong pieces of information.

Master and Bachelor Theses

Object detection in images from news articles is a very challenging task. On the one hand, available training data for object detectors is only...

more

Introduction/problem: Speculation/hedging/vagueness identification plays significant role in many applications, e.g. information extraction, machine...

more

The "ius commune" or "learned laws" (= "roman and canon law” of the Middle Ages) are full of citations which follow a set of generally common rules...

more

The paper "What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank" by Stefan Neumann, Rainer Gemulla, and Pauli Miettinen has been...

more

The paper "DESQ: Frequent Sequence Mining with Subsequence Constraints" by Kaustubh Beedkar and Rainer Gemulla has been accepted at the 2016 IEEE...

more

The finals of th Data Science Game 2016 took place in castle Les Fontaines near Paris on September 9th to 11th. For the second consecutive year, a...

more

The Web offers a goldmine of information describing a multitude of companies whose products and services can be potentially matched against Web users’...

more

Recently, there has been much interest to exploit Web-scale resource like the CommonCrawl for intelligent text processing and information extraction...

more

Recently, we started investigating methods and framework to automatically extract high-quality hypernym relations from Web-scale amounts of data,...

more

The DWS group is happy to announce a new release of the Web Data Commons RDFa, Microdata, Embedded JSON-LD and Microformat data corpus.

The data...

more

Publications

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007