Prof. Dr. Rainer Gemulla

Chair of Practical Computer Science I: Data Analytics

Universität Mannheim
Parkring 39, Room 105
D-68159 Mannheim

Tel.: +49 621 181 2480

rgemulla (at) uni-mannheim.de



I am heading the Chair of Practical Computer Science I: Data Analytics at the University of Mannheim. The chair is part of the Data and Web Science Group.

Go to: Curriculum Vitae, Research Interests, Ph.D. Students, Teaching, Awards, Publications

We have an opening for a PhD position; see here.

If you want to write a Bachelor or Master Theses with us, see here.

If you are looking for a student job, please approach me directly.

News

See here.

Research Interests

  • Data analysis and data mining
  • Text mining and information extraction
  • Optimization
  • Approximation techniques
  • Algorithms for modern hardware

Curriculum Vitae

Since 2014    W3-Professor for Practical Computer Science I, Universität Mannheim, Germany
2010 - 2014    Senior researcher / group leader, Max-Planck-Institut für Informatik, Saarbrücken, Germany
2008 - 2010    Postdoctoral researcher, IBM Almaden Research Center, San Jose, CA, USA
2004 - 2008    PhD in Computer Science, Technische Universität Dresden, Germany

Awards

  • Junior fellow of the Gesellschaft für Informatik (GI), 2013
  • AWS in Education Research Grant Award, 2013
  • Busy Beaver teaching award (winter term 2012/2013, for Non-Traditional Data Management (NoSQL and more))
  • IBM's 2011 Pat Goldberg Memorial best paper award in CS, EE and Math
    (for "Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent" with P. J. Haas, E. Nijkamp, and Y. Sismanis; KDD 2011)
  • Best paper of NIPS 2011 Biglearn workshop
    (for "Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent" with P. J. Haas, Y. Sismanis, C. Teflioudi, and F. Makari)
  • Google Focused Research Award 2011: Robust and Scalable Fact Discovery from Web Sources
    (with G. Weikum and M. Theobald)
  • Research Highlight in Communications of the ACM
    (for "Distinct-Value Synopses for Multiset Operations" with K. Beyer, P.J. Haas, B. Reinwald, Y. Sismanis)
  • The VLDB Journal, Special Issue: Best Papers of VLDB 2006
    (for "A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets" with W. Lehner and P.J. Haas)

Ph.D. Students

Former students

  • Dr. Faraz Makari Manshadi, moved to IBM Almaden Research Center

Teaching

See here for thesis topics offered by our chair.

Current semester (HWS 2015)

Previous semesters

Publications

2015    L. Del Corro, A. Abujabal, R. Gemulla, G. Weikum
FINET: Context-Aware Fine-Grained Named Entity Typing [pdf, resources]
To appear in EMNLP, 2015
F. Petroni, L. Del Corro, R. Gemulla
CORE: Context-Aware Open Relation Extraction with Factorization Machines [pdf, resources]
To appear in EMNLP, 2015
K. Beedkar, K. Berberich, R. Gemulla, I. Miliaraki
Closing the Gap: Sequence Mining at Scale [pdf (journal version), pdf (author version), resources]
In TODS, Vol. 40, No. 2, Art. 8, 2015
C. Teflioudi, R. Gemulla, O. Mykytiuk
LEMP: Fast Retrieval of Large Entries in a Matrix Product [pdf, resources]
In SIGMOD, pp. 107-122, 2015
K. Beedkar, R. Gemulla
LASH: Large-Scale Sequence Mining with Hierarchies [pdf, source code]
In SIGMOD, pp. 491-503, 2015
2014    L. Del Corro, R. Gemulla, G. Weikum
Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning [pdf, resources]
In EMNLP, pp. 374-385, 2014
P. Roy, J. Teubner, R. Gemulla
Low-Latency Handshake Join [pdf]
In PVLDB, 7(9), pp. 709-720, 2014
L. Qu, Y. Zhang, R. Wang, L. Jiang, R. Gemulla, G. Weikum
Senti-LSSVM: Sentiment-Oriented Multi-Relation Extraction with Latent Structural SVM [pdf]
In TACL, 2, pp. 155-168, 2014
D. Erdös, R. Gemulla, E. Terzi
Reconstructing Graphs from Neighborhood Data [pdf (author version), pdf (journal version)]
In TKDD, 8(4), 2014
2013    F. Makari, C. Teflioudi, R. Gemulla, P. J. Haas, Y. Sismanis
Shared-Memory and Shared-Nothing Stochastic Gradient Descent Algorithms for Matrix Completion [pdf (author version), pdf (journal version)]
In KAIS (special issue: best papers of ICDM 2012), pp. 1-31, 2013
F. Makari, R. Gemulla
A Distributed Approximation Algorithm for Mixed Packing-Covering Linear Programs [pdf]
In NIPS 2013 Biglearn workshop (poster), 2013
F. Makari, B. Awerbuch, R. Gemulla, R. Khandekar, J. Mestre, M. Sozio
A Distributed Algorithm for Large-Scale Generalized Matching [pdf]
The analysis of the number of binary search steps (Lemma 2) contains a bug; see our Biglearn paper for a corrected version.
In PVLDB, 6(9), pp. 613-624, 2013
I. Miliaraki, K. Berberich, R. Gemulla, S. Zoupanos
Mind the Gap: Large-Scale Frequent Sequence Mining [pdf, slides, resources]
In SIGMOD, pp. 797-808, 2013
L. Del Corro, R. Gemulla
ClausIE: Clause-Based Open Information Extraction [pdf, slides, resources]
In WWW, pp. 355-366, 2013
R. Gemulla, P. J. Haas, W. Lehner
Non-Uniformity Issues and Workarounds in Bounded-Size Sampling [pdf (author version), pdf (journal version), source code]
In The VLDB Journal, 22(6), pp. 753-772, 2013
K. Beedkar, L. Del Corro, R. Gemulla
Fully Parallel Inference in Markov Logic Networks [pdf]
In BTW, pp. 205-224, 2013
2012    D. Erdös, R. Gemulla, E. Terzi
Reconstructing Graphs from Neighborhood Data [pdf, slides]
In ICDM, pp. 231-240, 2012
C. Teflioudi, F. Makari, R. Gemulla
Distributed Matrix Completion [pdf, slides]
In ICDM, pp. 655-664, 2012
L. Qu, R. Gemulla, G. Weikum
A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts [pdf]
In EMNLP-CoNLL, pp. 149-159, 2012
2011    R. Gemulla, P. J. Haas, Y. Sismanis, C. Teflioudi, F. Makari
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf, slides]
In NIPS 2011 Biglearn workshop, 2011 (best paper award)

R. Gemulla, E. Nijkamp, P. J. Haas, Y. Sismanis
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf, slides]
In KDD, pp. 69-77, 2011

K. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.C. Kanne, F. Ozcan, E. Shekita
Jaql: A Scripting Language for Large Scale Semistructured Data Analysis [pdf]
In PVLDB (industrial track), 4(11), pp. 1272-1283, 2011

M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, J. McPherson
CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop [pdf]
In PVLDB, 4(9), pp. 575-585, 2011

R. Gemulla, P. J. Haas, E. Nijkamp, Y. Sismanis
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf]
IBM Research Report RJ10481, March 2011 Revised February, 2013

B. Schlegel, R. Gemulla, W. Lehner
Memory-Efficient Frequent-Itemset Mining [pdf]
In EDBT, pp. 461-472, 2011
2010    S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, J. McPherson.
Ricardo: Integrating R and Hadoop [pdf]
In SIGMOD (industrial track), pp. 987-998, 2010

B. Schlegel, R. Gemulla, W. Lehner.
Fast Integer Compression using SIMD Instructions [pdf]
In DAMON, pp. 34-40, 2010
2009    K. Beyer, R. Gemulla. P. J. Haas, B. Reinwald, Y. Sismanis.
Distinct-Value Synopses for Multiset Operations [pdf]
In Commun. ACM, 52(10), pp. 87-95, 2009
Technical perspective by Surajit Chaudhuri.

B. Schlegel, R. Gemulla, W. Lehner.
k-Ary Search on Modern Processors [pdf, slides]
In DAMON, pp. 52-60, 2009
2008    R. Gemulla.
Sampling Algorithms for Evolving Datasets [pdf, summary, slides]
Ph.D. thesis, Technische Universität Dresden, 2009
URL for citations: http://nbn-resolving.de/urn:nbn:de:bsz:14-ds-1224861856184-11644

R. Gemulla, P. Rösch and W. Lehner.
Linked Bernoulli Synopses: Sampling Along Foreign Keys [pdf, slides]
In SSDBM, pp. 6-23, 2008

R. Gemulla and W. Lehner.
Sampling Time-Based Sliding Windows in Bounded Space [pdf, slides]
In SIGMOD, pp. 379-392, 2008

P. Rösch, R. Gemulla and W. Lehner.
Designing Random Sample Synopses with Outliers [pdf, poster]
In ICDE (poster), pp. 1400-1402, 2008
2007    R. Gemulla, W. Lehner and P.J. Haas.
Maintaining Bounded-Size Sample Synopses of Evolving Datasets [pdf]
In The VLDB Journal, Special Issue: Best Papers of VLDB 2006, pp. 173-201, 2007
Note: The resizing algorithm proposed in this article contains a bug; see my Ph.D. thesis or our 2013 VLDB Journal paper for a corrected version.

K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis and R. Gemulla.
On Synopses for Distinct-Value Estimation Under Multiset Operations [pdf, slides]
In SIGMOD, pp. 199-210, 2007

R. Gemulla, W. Lehner and P. J. Haas.
Maintaining Bernoulli Samples over Evolving Multisets [pdf, slides]
In PODS, pp. 93-102, 2007
2006    R. Gemulla, W. Lehner and P. J. Haas.
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets [pdf, slides]
In VLDB, pp. 595-606, 2006

A. Klein, R. Gemulla, P. Rösch and W. Lehner.
Derby/S: A DBMS for Sample-Based Query Answering [pdf, poster1, poster2]
In SIGMOD (demo), pp. 757-759, 2006

R. Gemulla and W. Lehner.
Deferred Maintenance of Disk-Based Random Samples [pdf, slides]
In EDBT, pp. 423-441, 2006