The paper "DESQ: Frequent Sequence Mining with Subsequence Constraints" by Kaustubh Beedkar and Rainer Gemulla has been accepted at the 2016 IEEE International Conference on Data Mining (ICDM).
Frequent sequence mining methods often make use of constraints to control which subsequences should be mined; e.g., length, gap, span, regular-expression, and hierarchy constraints. We show that many subsequence constraints—including and beyond those considered in the literature—can be unified in a single framework. In more detail, we propose a set of simple and intuitive “pattern expressions” to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners.