Focus Group: Data Analytics

(Prof. Gemulla)

Our group's research focuses on systems and methods for analyzing and and learning from large datasets as well as their application in practice, including:

Machine learning with semi-structured/structured datadata
Combining unstructured and structured knowledge
Representation learning for multi-relational graphs
Efficient and scalable methods and systems for data-intensive processing

News

Paper accepted at EMNLP Findings 2023: A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs

The paper “A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs” by Adrian Kochsiek and Rainer Gemulla has been accepted at the Findings of the Association for Computational Linguistics: EMNLP 2023. Abstract: Semi-inductive link prediction (LP) in knowledge graphs (KG) is the task ...

Paper accepted at CIKM 2023: Good Intentions: Adaptive Parameter Management via Intent Signaling

The paper “Good Intentions: Adaptive Parameter Management via Intent Signaling” by Alexander Renz-Wieland, Andreas Kieslinger, Robert Gericke, Rainer Gemulla, Zoi Kaoudi, and Volker Markl has been accepted at the 2023 CIKM Conference on Information and Knowledge Management. Abstract: Model ...

Paper accepted in Repl4NLP 2023: Friendly Neighbors: Contextualized Sequence-to-Sequence Link Prediction

The paper “Friendly Neighbors: Contextualized Sequence-to-Sequence Link Prediction” by Adrian Kochsiek, Apoorv Saxena, Inderjeet Nair, and Rainer Gemulla has been accepted at the 2023 Repl4NLP Workshop on Representation Learning for NLP, hosted by ACL 2023. Abstract: We propose KGT5-context, a ...

Paper accepted in ECML-PKDD 2022: Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings

The paper “Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings” by Adrian Kochsiek, Fritz Niesel, and Rainer Gemulla has been accepted at the 2022 ECML-PKDD European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in ...

all news

People

Chair

Prof. Dr. Rainer Gemulla

Office

Kerstin Maier

PhD Students

Former PhD students

Kaustubh Beedkar, Luciano del Corro, Kiril Gashteovski, Stefan Kain, Faraz Makari Manshadi, Alexander Renz-Wieland, Christina Teflioudi, Yanjie Wang

Data and Software

AdaPM: A fully adaptive parameter manager
LibKGE: A knowledge graph embedding library
Lapse: A parameter server with dynamic parameter allocation
OPIEC: An open information extraction corpus
MinIE: Open information extractor (spiritual successor to ClausIE)
DSGDpp: Various parallel algorithms for matrix factorization (including DSGD++)
DESQ: Frequent sequence mining with subsequence constraints
Rounding rank: algorithms for computing rounding-rank decompositions
CORE: Context-aware open relation extraction with factorization machines
FINET: Context-aware fine-grained named entity typing
Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning
ClausIE: Clause-Based Open Information Extraction
LEMP: Fast Retrieval of Large Entries in a Matrix Product
LASH: Large-Scale Sequence Mining with Hierarchies
MG-FSM: Large-Scale Frequent Sequence Mining

Teaching

If you are interested in writing a seminar, Bachelor or Master thesis with us, please read the following guidelines.

Current semester (FSS 2024)

CS 303: Praktische Informatik II (Bachelor course)
IE 678: Deep Learning (Master course)
CS 707: Data and Web Science Seminar (Master seminar)
Data Analytics Team Project: Your Project, Your Team
Colloquium (for PhD candidates)

Previous semester (HWS 2023)

CS 560: Large-Scale Data Management (Master Course)
IE 675b: Machine Learning (Master course)
SM 445: Data and Web Science Seminar (Bachelor seminar)
Team Project: Modelling Real Energy Consumption in Industrial Production (with Sun Chemical)
Colloquium (for PhD candidates)

Publications

See also Google Scholar and DBLP.

2023	A. Kochsiek, R. Gemulla A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs [pdf, resources] In EMNLP Findings, 2023
	A. Kochsiek, A. Saxena, I. Nair, R. Gemulla Friendly Neighbors: Contextualized Sequence-to-Sequence Link Prediction [pdf, resources] In Repl4NLP workshop, 2023
	A. Renz-Wieland, A. Kieslinger, R. Gericke, R. Gemulla, Z. Kaoudi, V. Markl Good Intentions: Adaptive Parameter Management via Intent Signaling [pdf, resources] In CIKM, 2023
2022	A. Kochsiek, F. Niesel, R. Gemulla Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings [pdf, resources] In ECML-PKDD, 2022
	A. Saxena, A. Kochsiek, R. Gemulla Sequence-to-Sequence Knowledge Graph Completion and Question Answering [pdf, , video resources] In ACL, pp. 2814-2828, 2022
	A. Renz-Wieland, R. Gemulla, Z. Kaoudi, V. Markl NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access [pdf, source code] In SIGMOD, pp. 481–495, 2022
2021	A. Kochsiek, R. Gemulla Parallel Training of Knowledge Graph Embedding Models: A Comparison of Techniques [pdf, resources] In PVLDB, 15(3), 2021
	A. Renz-Wieland, T. Drobisch, R. Gemulla, Z. Kaoudi, V. Markl Just Move It! Dynamic Parameter Allocation in Action [pdf, demo] In PVLDB (demo), 14(12), 2021.
2020	A. Renz-Wieland, R. Gemulla, S. Zeuch, V. Markl Dynamic Parameter Allocation in Parameter Servers [pdf, source code] In PVLDB, 13(12), pp. 1877-1890, 2020
	S. Broscheit, K. Gashteovski, Y. Wang, Rainer Gemulla Can We Predict New Facts with Open Knowledge Graph Embeddings? A Benchmark for Open Link Prediction [pdf, resources] In ACL, 2020
	D. Ruffinelli, S. Broscheit, R. Gemulla You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings [pdf, video, resources, OpenReview] In ICLR, 2020
	S. Broscheit, D. Ruffinelli, A. Kochsiek, P. Betz, R. Gemulla LibKGE – A knowledge graph embedding library for reproducible research [pdf, source] In EMNLP (demo), 2020
	K. Gashteovski, R. Gemulla, B. Kotnis, S. Hertling, C. Meilicke On Aligning OpenIE Extractions with Knowledge Bases: A Case Study [pdf, slides, resources] In Eval4NLP, 2020
2019	Y. Wang, D. Ruffinelli, R. Gemulla, S. Broscheit, C. Meilicke On Evaluating Embedding Models for Knowledge Base Completion [pdf] In RepL4NLP workshop, 2019
	K. Beedkar, R. Gemulla, W. Martens A Unified Framework for Frequent Sequence Mining with Subsequence Constraints [pdf (journal version), pdf (author version), resources] In TODS, 2019
	K. Gashteovski, S. Wanner, S. Hertling, S. Broscheit, R. Gemulla OPIEC: An Open Information Extraction Corpus [pdf, poster, resources, OpenReview] In AKBC, 2019
	A. Renz-Wieland, M. Bertsch, R. Gemulla Scalable Frequent Sequence Mining With Flexible Subsequence Constraints [pdf, poster] In ICDE, 2019
Preprints (2019)	Y. Wang, S. Broscheit, R. Gemulla A Relational Tucker Decomposition for Multi-Relational Link Prediction [arXiv] 2019
2018	C. Meilicke, M. Fink, Y. Wang, D. Ruffinelli, R. Gemulla, and H. Stuckenschmidt Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion [pdf, resources] In ISWC, 2018
	J. Pfeiffer, S. Broscheit, R. Gemulla, M. Göschl A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval [pdf] In BioNLP workshop, 2018
	S. Broscheit, R. Gemulla, M. Keuper Learning Distributional Token Representations from Visual Features [pdf] In RepL4NLP workshop, 2018
	Y. Wang, R. Gemulla, H. Li On Multi-Relational Link Prediction with Bilinear Models [pdf, resources] In AAAI, 2018
2017	K. Gashteovski, R. Gemulla, L. del Corro MinIE: Minimizing Facts in Open Information Extraction [pdf, poster, resources] In EMNLP, pp. 2620-2630, 2017
	C. Teflioudi, R. Gemulla Exact and Approximate Maximum Inner Product Search with LEMP [pdf (journal version), pdf (author version), resources] In TODS, 42(1) Art. 5, 2017
2016	S. Neumann, R. Gemulla, P. Miettinen What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank [pdf, tech report, resources] In ICDM, pp. 380–389, 2016
	K. Beedkar, R. Gemulla DESQ: Frequent Sequence Mining with Subsequence Constraints [pdf, tech report, resources] In ICDM (short paper), pp. 793–798, 2016
2015	L. Del Corro, A. Abujabal, R. Gemulla, G. Weikum FINET: Context-Aware Fine-Grained Named Entity Typing [pdf, slides, resources] In EMNLP, pp. 868–878, 2015
	F. Petroni, L. Del Corro, R. Gemulla CORE: Context-Aware Open Relation Extraction with Factorization Machines [pdf, slides, resources] In EMNLP, pp. 1763-1773, 2015
	K. Beedkar, K. Berberich, R. Gemulla, I. Miliaraki Closing the Gap: Sequence Mining at Scale [pdf (journal version), pdf (author version), resources] In TODS, 40(2) Art. 8, 2015
	C. Teflioudi, R. Gemulla, O. Mykytiuk LEMP: Fast Retrieval of Large Entries in a Matrix Product [pdf, slides, resources] In SIGMOD, pp. 107–122, 2015
	K. Beedkar, R. Gemulla LASH: Large-Scale Sequence Mining with Hierarchies [pdf, slides, source code] In SIGMOD, pp. 491–503, 2015
	R. Gemulla A Self-Portrayal of GI Junior Fellow Rainer Gemulla: Data Analysis at Scale [pdf (journal version), pdf (author version)] it – Information Technology 57(2), pp. 130–132 , 2015
2014	L. Del Corro, R. Gemulla, G. Weikum Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning [pdf, resources] In EMNLP, pp. 374–385, 2014
	P. Roy, J. Teubner, R. Gemulla Low-Latency Handshake Join [pdf] In PVLDB, 7(9), pp. 709–720, 2014
	L. Qu, Y. Zhang, R. Wang, L. Jiang, R. Gemulla, G. Weikum Senti-LSSVM: Sentiment-Oriented Multi-Relation Extraction with Latent Structural SVM [pdf] In TACL, 2, pp. 155–168, 2014
	D. Erdös, R. Gemulla, E. Terzi Reconstructing Graphs from Neighborhood Data [pdf (author version), pdf (journal version)] In TKDD, 8(4), 2014
2013	F. Makari, C. Teflioudi, R. Gemulla, P. J. Haas, Y. Sismanis Shared-Memory and Shared-Nothing Stochastic Gradient Descent Algorithms for Matrix Completion [pdf (author version), pdf (journal version), source code] In KAIS (special issue: best papers of ICDM 2012), pp. 1–31, 2013
	F. Makari, R. Gemulla A Distributed Approximation Algorithm for Mixed Packing-Covering Linear Programs [pdf] In NIPS 2013 Biglearn workshop (poster), 2013
	F. Makari, B. Awerbuch, R. Gemulla, R. Khandekar, J. Mestre, M. Sozio A Distributed Algorithm for Large-Scale Generalized Matching [pdf, slides] The analysis of the number of binary search steps (Lemma 2) contains a bug; see our Biglearn paper for a corrected version. In PVLDB, 6(9), pp. 613–624, 2013
	I. Miliaraki, K. Berberich, R. Gemulla, S. Zoupanos Mind the Gap: Large-Scale Frequent Sequence Mining [pdf, slides, resources] In SIGMOD, pp. 797–808, 2013
	L. Del Corro, R. Gemulla ClausIE: Clause-Based Open Information Extraction [pdf, slides, resources] In WWW, pp. 355–366, 2013
	R. Gemulla, P. J. Haas, W. Lehner Non-Uniformity Issues and Workarounds in Bounded-Size Sampling [pdf (author version), pdf (journal version), source code] In The VLDB Journal, 22(6), pp. 753–772, 2013
	K. Beedkar, L. Del Corro, R. Gemulla Fully Parallel Inference in Markov Logic Networks [pdf] In BTW, pp. 205–224, 2013
2012	D. Erdös, R. Gemulla, E. Terzi Reconstructing Graphs from Neighborhood Data [pdf, slides] In ICDM, pp. 231–240, 2012
	C. Teflioudi, F. Makari, R. Gemulla Distributed Matrix Completion [pdf, slides, source code] In ICDM, pp. 655–664, 2012
	L. Qu, R. Gemulla, G. Weikum A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts [pdf] In EMNLP-CoNLL, pp. 149–159, 2012
2011	R. Gemulla, P. J. Haas, Y. Sismanis, C. Teflioudi, F. Makari Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf, slides, source code] In NIPS 2011 Biglearn workshop, 2011 (best paper award)
	R. Gemulla, E. Nijkamp, P. J. Haas, Y. Sismanis Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf, slides, source code] In KDD, pp. 69–77, 2011
	K. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.C. Kanne, F. Ozcan, E. Shekita Jaql: A Scripting Language for Large Scale Semistructured Data Analysis [pdf] In PVLDB (industrial track), 4(11), pp. 1272-1283, 2011
	M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, J. McPherson CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop [pdf] In PVLDB, 4(9), pp. 575–585, 2011
	R. Gemulla, P. J. Haas, E. Nijkamp, Y. Sismanis Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf] IBM Research Report RJ10481, March 2011 Revised February, 2013
	B. Schlegel, R. Gemulla, W. Lehner Memory-Efficient Frequent-Itemset Mining [pdf] In EDBT, pp. 461–472, 2011
2010	S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, J. McPherson. Ricardo: Integrating R and Hadoop [pdf] In SIGMOD (industrial track), pp. 987–998, 2010
	B. Schlegel, R. Gemulla, W. Lehner. Fast Integer Compression using SIMD Instructions [pdf] In DAMON, pp. 34–40, 2010
2009	K. Beyer, R. Gemulla. P. J. Haas, B. Reinwald, Y. Sismanis. Distinct-Value Synopses for Multiset Operations [pdf, technical perspective by Surajit Chaudhuri] In Commun. ACM, 52(10), pp. 87–95, 2009
	B. Schlegel, R. Gemulla, W. Lehner. k-Ary Search on Modern Processors [pdf, slides] In DAMON, pp. 52–60, 2009
2008	R. Gemulla. Sampling Algorithms for Evolving Datasets [pdf, summary, slides] Ph.D. thesis, Technische Universität Dresden, 2009 URL for citations: nbn-resolving.de/urn:nbn:de:bsz:14-ds-1224861856184-11644
	R. Gemulla, P. Rösch and W. Lehner. Linked Bernoulli Synopses: Sampling Along Foreign Keys [pdf, slides] In SSDBM, pp. 6–23, 2008
	R. Gemulla and W. Lehner. Sampling Time-Based Sliding Windows in Bounded Space [pdf, slides] As observed by Hu et al., the lower bound of Ω(k log N) stated in Theorem 1 should read Ω(k log(N/k)). In SIGMOD, pp. 379–392, 2008
	P. Rösch, R. Gemulla and W. Lehner. Designing Random Sample Synopses with Outliers [pdf, poster] In ICDE (poster), pp. 1400-1402, 2008
2007	R. Gemulla, W. Lehner and P.J. Haas. Maintaining Bounded-Size Sample Synopses of Evolving Datasets [pdf] The resizing algorithm proposed in this article contains a bug; see my Ph.D. thesis or our 2013 VLDB Journal paper for a corrected version. In The VLDB Journal, Special Issue: Best Papers of VLDB 2006, pp. 173–201, 2007
	K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis and R. Gemulla. On Synopses for Distinct-Value Estimation Under Multiset Operations [pdf, slides] In SIGMOD, pp. 199–210, 2007
	R. Gemulla, W. Lehner and P. J. Haas. Maintaining Bernoulli Samples over Evolving Multisets [pdf, slides] In PODS, pp. 93–102, 2007
2006	R. Gemulla, W. Lehner and P. J. Haas. A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets [pdf, slides] In VLDB, pp. 595–606, 2006
	A. Klein, R. Gemulla, P. Rösch and W. Lehner. Derby/S: A DBMS for Sample-Based Query Answering [pdf, poster1, poster2] In SIGMOD (demo), pp. 757–759, 2006
	R. Gemulla and W. Lehner. Deferred Maintenance of Disk-Based Random Samples [pdf, slides] In EDBT, pp. 423–441, 2006