We are an academic research group headed by Prof. Xu Chu, and we are a part of the School of Computer Science at Georgia Tech. We are members of the Database Group and affiliate members of the ML center and Institute for Data Engineering and Science in Georgia Tech.
We are generally interested in data management and machine learning. In particular, we are interested in practical and challenging problems that are in the intersection of these two fields. Example problems we are actively working on include: machine learning for data cleaning and integration, data cleaning for machine learning, training data generation for image and tabular data, automatic feature engineering, and systems for managing machine learning analytics pipelines. For more information, please visit our research page, and a relevant graduate course we are offering every year.
We are looking for passionate new PhD students, Postdocs, and Master students to join the team (more info) !
Follow us on Twitter
Our ZeroER work on performing entity resolution with zero labeled examples has been accepted to SIGMOD 2020
12/03/2019Our GOGGLES work on domain-agnostic training data labeling has been accepted to SIGMOD 2020
08/07/2019Our team in collaboration with Alibaba won the third place in the KDD 2019 AutoML Challenge
08/01/2019Our ACM book on data cleaning is up for sale on Amazon
04/25/2019We are releasing CleanML, a benchmark for cleaning for ML
03/13/2019We are releasing GOGGLES, a system for automatic generation of training data!
03/12/2019Our PIClean system demo is accepted by SIGMOD 2019!
02/08/2019After two years in the making, we have finally finished our manuscript for the data cleaning book! It will be published by ACM Books hopefully in early 2019, stay tuned!
02/01/2019We are excited to learn that we are granted the 2019 JP Morgan Faculty Research Awards!