CS 703 Data Analytics Seminar
The Data Analytics Seminar takes place every semester and covers selected (recent/hot) topics in areas related to data analytics, such as large-scale data management, data mining & machine learning, NLP, or text analytics.
GoalsIn this seminar, you will
- Read, understand, and explore scientific literature
- Summarize a current research topic in a concise report
- Give a presentation about your topic
- Moderate a scientific discussion about a topic of one of your fellow students
- Provide feedback to a report and a presentation of a fellow student
- Select your preferred topics and register by Feb 10 (see below)
- Attend the kickoff meeting in second week of the semester
- You will be assigned a tutor, who provides guidance and one-to-one meetings
- Work individually throughout the semester: explore literature, write a report, peer-review, create a presentation
- Give your presentation and moderate a presentation of a fellow student in a block seminar at the end of the semester
Explore the list of topics below and select at least 3 topics of your preference. Send a ranked list of your selected topics via email to Kaustubh Beedkar until Feb 13, 2015. We will confirm your registration immediately. The actual topic assignment takes place at the kickoff meeting; our goal is, of course, to assign to you you one of your preferred topics.
This semester's seminar is structured into two tracks: a systems track and a data mining track. The systems track covers recent systems and architectural approaches for large-scale data analytics in various contexts. The data mining track focuses on novel approaches to specific data mining tasks and applications. The papers and articles listed below serve as an entry point to a topic; you are expected to explore related relevant literature on your own.
- Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, Robert DeLine, Danyel Fisher, John C. Platt, James F. Terwilliger, and John Wernsing: Trill: A High Performance Incremental Query Processor for Diverse Analytics, in VLDB 2015.
- Yingyi Bu, Vinayak Borkar, Jianfeng Jia, Michael J. Carey, Tyson Condie: Pregelix: Big(ger) Graph Analytics on A Dataflow Engine, in VLDB 2015.
- Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani: Parallel data analysis directly on scientific file format, in SIGMOD 2014.
- Daniel Tahara, Thaddeus Diamond, Daniel Abadi: Sinew: a SQL System for Unified Analytics of Multi-structured Data, in SIGMOD 2014.
- Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, Alexander J. Smola: Distributed large-scale natural graph factorization, in WWW 2013.
- Cong Xie, Ling Yan, Wu-Jun Li, Zhihua Zhang: Distributed Power-law Graph Computing: Theoretical and Empirical Analysis, in NIPS 2014.
- Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica: Spark: Cluster Computing with Working Sets, in HotCloud 2010.
Data mining track:
- Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare Voss, Jiawei Han: Scalable Topical Phrase Mining from Text Corpora, in VLDB 2015.
- Saravanan Thirumuruganathan, Habibur Rahman, Sofiane Abbar, Gautam Das: Beyond Itemsets: Mining Frequent Featuresets over Structured Items, in VLDB 2015.
- Qiang Zeng, Jignesh M. Patel, David Page: QuickFOIL: Scalable Inductive Logic Programming, in VLDB 2014.
- Wanyun Cui, Yanghua Xiao, Haixun Wang, Wei Wang: Local Search of Communities in Large Graphs, in SIGMOD 2014.
- Albert Kim, Eric Blais, Aditya Parameswaran, Piotr Indyk, Sam Madden, Ronitt Rubinfeld: Rapid Sampling for Visualizations with Ordering Guarantees, in VLDB 2015.
- Bahman Bahmani, Kaushik Chakrabarti, Dong Xin: Fast Personalized PageRank on MapReduce, in SIGMOD 2011.
- Sebastian Riedel, Limin Yao, Andrew McCallum, Benjamin M. Marlin: Relation Extraction with Matrix Factorization and Universal Schemas, in NAACL 2013.
- Jeffrey Pennington, Richard Socher and Christopher D. Manning: Glove: Global vectors for word representation, in EMNLP 2014.
- Jiawei Han, Jian Pei, Yiwen Yin: Mining frequent patterns without candidate generation, in SIGMOD 2010.