Student Assistant for Web-Scale Knowledge Harvesting

We are looking for a student assistant to develop and conduct experiments regarding the creation of fact databases from the Common Crawl, a huge dataset of crawled web pages. 

Requirements are:

  • Solid programming skills in Java,
  • Experience with Hadoop and/or work with very large datasets (or the willingness to learn it),
  • Previous knowledge of natural language processing is a plus.

 

Contact: Kai Eckert or Simone Ponzetto.

Note: We also provide topics for a Master thesis in the same area.