Prof. Dr. Rainer Gemulla

Chair of Practical Computer Science I: Data Analytics

Universität Mannheim
Parkring 39, Raum 105
D-68131 Mannheim

Tel.: +49 621 181 2480

rgemulla (at) uni-mannheim.de

Curriculum Vitae

Since 2014    W3-Professor for Practical Computer Science I, Universität Mannheim, Germany
2010 - 2014    Senior researcher / group leader, Max-Planck-Institut für Informatik, Saarbrücken, Germany
2008 - 2010    Postdoctoral researcher, IBM Almaden Research Center, San Jose, CA, USA
2004 - 2008    PhD in Computer Science, Technische Universität Dresden, Germany

Research Interests

  • Data analysis and data mining
  • Text mining and information extraction
  • Optimization
  • Approximation techniques
  • Algorithms for modern hardware

Teaching

Current semester (HWS 2014)

Previous semesters

Awards

  • Junior fellow of the Gesellschaft für Informatik (GI)
  • AWS in Education Research Grant Award, 2013
  • Busy Beaver teaching award (winter term 2012/2013, for Non-Traditional Data Management (NoSQL and more))
  • IBM's 2011 Pat Goldberg Memorial best paper award in CS, EE and Math
    (for "Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent" with P. J. Haas, E. Nijkamp, and Y. Sismanis; KDD 2011)
  • Best paper of NIPS 2011 Biglearn workshop
    (for "Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent" with P. J. Haas, Y. Sismanis, C. Teflioudi, and F. Makari)
  • Google Focused Research Award 2011: Robust and Scalable Fact Discovery from Web Sources
    (with G. Weikum and M. Theobald)
  • Research Highlight in Communications of the ACM
    (for "Distinct-Value Synopses for Multiset Operations" with K. Beyer, P.J. Haas, B. Reinwald, Y. Sismanis)
  • The VLDB Journal, Special Issue: Best Papers of VLDB 2006
    (for "A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets" with W. Lehner and P.J. Haas)

Publications

2014    P. Roy, J. Teubner, R. Gemulla
Low-Latency Handshake Join [pdf]
In PVLDB, 7(9), pp. 709-720, 2014
L. Qu, Y. Zhang, R. Wang, L. Jiang, R. Gemulla, G. Weikum
Senti-LSSVM: Sentiment-Oriented Multi-Relation Extraction with Latent Structural SVM [pdf]
In TACL, 2, pp. 155-168, 2014
D. Erdös, R. Gemulla, E. Terzi
Reconstructing Graphs from Neighborhood Data [pdf (author version), pdf (journal version)]
In TKDD, 8(4), 2014
2013    F. Makari, C. Teflioudi, R. Gemulla, P. J. Haas, Y. Sismanis
Shared-Memory and Shared-Nothing Stochastic Gradient Descent Algorithms for Matrix Completion [pdf (author version), pdf (journal version)]
In KAIS (special issue: best papers of ICDM 2012), pp. 1-31, 2013
F. Makari, R. Gemulla
A Distributed Approximation Algorithm for Mixed Packing-Covering Linear Programs [pdf]
In NIPS 2013 Biglearn workshop (poster), 2013
F. Makari, B. Awerbuch, R. Gemulla, R. Khandekar, J. Mestre, M. Sozio
A Distributed Algorithm for Large-Scale Generalized Matching [pdf]
The analysis of the number of binary search steps (Lemma 2) contains a bug; see our Biglearn paper for a corrected version.
In PVLDB, 6(9), pp. 613-624, 2013
I. Miliaraki, K. Berberich, R. Gemulla, S. Zoupanos
Mind the Gap: Large-Scale Frequent Sequence Mining [pdf, slides, resources]
In SIGMOD, pp. 797-808, 2013
L. Del Corro, R. Gemulla
ClausIE: Clause-Based Open Information Extraction [pdf, slides, resources]
In WWW, pp. 355-366, 2013
R. Gemulla, P. J. Haas, W. Lehner
Non-Uniformity Issues and Workarounds in Bounded-Size Sampling [pdf (author version), pdf (journal version), source code]
In The VLDB Journal, 22(6), pp. 753-772, 2013
K. Beedkar, L. Del Corro, R. Gemulla
Fully Parallel Inference in Markov Logic Networks [pdf]
In BTW, pp. 205-224, 2013
2012    D. Erdös, R. Gemulla, E. Terzi
Reconstructing Graphs from Neighborhood Data [pdf, slides]
In ICDM, pp. 231-240, 2012
C. Teflioudi, F. Makari, R. Gemulla
Distributed Matrix Completion [pdf, slides]
In ICDM, pp. 655-664, 2012
L. Qu, R. Gemulla, G. Weikum
A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts [pdf]
In EMNLP-CoNLL, pp. 149-159, 2012
2011    R. Gemulla, P. J. Haas, Y. Sismanis, C. Teflioudi, F. Makari
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf, slides]
In NIPS 2011 Biglearn workshop, 2011 (best paper award)

R. Gemulla, E. Nijkamp, P. J. Haas, Y. Sismanis
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf, slides]
In KDD, pp. 69-77, 2011

K. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.C. Kanne, F. Ozcan, E. Shekita
Jaql: A Scripting Language for Large Scale Semistructured Data Analysis [pdf]
In PVLDB (industrial track), 4(11), pp. 1272-1283, 2011

M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, J. McPherson
CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop [pdf]
In PVLDB, 4(9), pp. 575-585, 2011

R. Gemulla, P. J. Haas, E. Nijkamp, Y. Sismanis
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf]
IBM Research Report RJ10481, March 2011 Revised February, 2013

B. Schlegel, R. Gemulla, W. Lehner
Memory-Efficient Frequent-Itemset Mining [pdf]
In EDBT, pp. 461-472, 2011
2010    S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, J. McPherson.
Ricardo: Integrating R and Hadoop [pdf]
In SIGMOD (industrial track), pp. 987-998, 2010

B. Schlegel, R. Gemulla, W. Lehner.
Fast Integer Compression using SIMD Instructions [pdf]
In DAMON, pp. 34-40, 2010
2009    K. Beyer, R. Gemulla. P. J. Haas, B. Reinwald, Y. Sismanis.
Distinct-Value Synopses for Multiset Operations [pdf]
In Commun. ACM, 52(10), pp. 87-95, 2009
Technical perspective by Surajit Chaudhuri.

B. Schlegel, R. Gemulla, W. Lehner.
k-Ary Search on Modern Processors [pdf, slides]
In DAMON, pp. 52-60, 2009
2008    R. Gemulla.
Sampling Algorithms for Evolving Datasets [pdf, summary, slides]
Ph.D. thesis, Technische Universität Dresden, 2009
URL for citations: http://nbn-resolving.de/urn:nbn:de:bsz:14-ds-1224861856184-11644

R. Gemulla, P. Rösch and W. Lehner.
Linked Bernoulli Synopses: Sampling Along Foreign Keys [pdf, slides]
In SSDBM, pp. 6-23, 2008

R. Gemulla and W. Lehner.
Sampling Time-Based Sliding Windows in Bounded Space [pdf, slides]
In SIGMOD, pp. 379-392, 2008

P. Rösch, R. Gemulla and W. Lehner.
Designing Random Sample Synopses with Outliers [pdf, poster]
In ICDE (poster), pp. 1400-1402, 2008
2007    R. Gemulla, W. Lehner and P.J. Haas.
Maintaining Bounded-Size Sample Synopses of Evolving Datasets [pdf]
In The VLDB Journal, Special Issue: Best Papers of VLDB 2006, pp. 173-201, 2007
Note: The resizing algorithm proposed in this article contains a bug; see my Ph.D. thesis or our 2013 VLDB Journal paper for a corrected version.

K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis and R. Gemulla.
On Synopses for Distinct-Value Estimation Under Multiset Operations [pdf, slides]
In SIGMOD, pp. 199-210, 2007

R. Gemulla, W. Lehner and P. J. Haas.
Maintaining Bernoulli Samples over Evolving Multisets [pdf, slides]
In PODS, pp. 93-102, 2007
2006    R. Gemulla, W. Lehner and P. J. Haas.
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets [pdf, slides]
In VLDB, pp. 595-606, 2006

A. Klein, R. Gemulla, P. Rösch and W. Lehner.
Derby/S: A DBMS for Sample-Based Query Answering [pdf, poster1, poster2]
In SIGMOD (demo), pp. 757-759, 2006

R. Gemulla and W. Lehner.
Deferred Maintenance of Disk-Based Random Samples [pdf, slides]
In EDBT, pp. 423-441, 2006