Web Analytics Made Easy -
StatCounter

Publications

ZeroER: Entity Resolution using Zero Labeled Examples
Renzhi Wu, Sanya Chaba, Saurabh Sawlani, Xu Chu, Saravanan Thirumuruganathan
In Proceedings of the 2020 ACM SIGMOD Conference on Management of Data

GOGGLES: Automatic Training Data Generation with Affinity Coding
Nilaksh Das, Sanya Chaba, Sakshi Gandhi, Duen Horng Chau, Xu Chu
In Proceedings of the 2020 ACM SIGMOD Conference on Management of Data

Data Cleaning
Ihab F. Ilyas, Xu Chu
ACM Book Series

CleanML: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]
Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, Ce Zhang
arxiv, 2019

PIClean: a Probabilistic and Interactive Data Cleaning System
Zhuoran Yu, Xu Chu
In Proceedings of the 2019 ACM SIGMOD Conference on Management of Data, Amsterdam, Netherlands

Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformation
Yeye He, Xu Chu, Kris Ganjam, Yudian Zheng, Vivek Narasayya, Surajit Chaudhuri
In the 43rd Interntaional Confernce on Very Large Databases, VLDB 2018, Brazil

Data Cleaning (Book Chapter)
Xu Chu
In Encyclopedia of Big Data Technologies

Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel
Yeye He, Kris Ganjam, Yue Wang, Vivek Narasayya, Surajit Chaudhuri, Xu Chu, Yudian Zheng
In Proceedings of the 2018 ACM SIGMOD Conference on Management of Data, Houston, USA

HoloClean: Holistic Data Repairs with Probabilistic Inference
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, Christopher Ré
In the 43rd Interntaional Confernce on Very Large Databases, VLDB 2017, Munich, Germany

Detecting Data Errors: Where are we and what needs to be done?
Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, and Nan Tang
In the 42nd Interntaional Confernce on Very Large Databases, VLDB 2016, New Delhi, India

Distributed Data Deduplication
Xu Chu, Ihab F. Ilyas, Paraschos Koutris
In the 42nd Interntaional Confernce on Very Large Databases, VLDB 2016, New Delhi, India

Qualitative Data Cleaning
Xu Chu, Ihab F. Ilyas
In the 42nd Interntaional Confernce on Very Large Databases, VLDB 2016, New Delhi, India
Slides

Data Cleaning: Overview and Emerging Challenges
Xu Chu, Ihab F. Ilyas, Sanjay Krishnan, Jiannan Wang
In Proceedings of the 2016 ACM SIGMOD Conference on Management of Data, San Francisco, USA
Slides

CLAMS: Bringing Quality to Data Lakes
Mina Farid, Alexandra Roatis, Ihab F. Ilyas, Hella-Franziska Hoffmann, Xu Chu
In Proceedings of the 2016 ACM SIGMOD Conference on Management of Data, San Francisco, USA

Trends in Cleaning Relational Data: Consistency and Deduplication
Ihab F. Ilyas, Xu Chu
In Foundations and Trends® in Databases, Volume 5, Issue 4, 2015

SEMA-JOIN : Joining Semantically-Related Tables Using Big Table Corpora
Yeye He, Kris Ganjam, Xu Chu
In the 41st Interntaional Confernce on Very Large Databases, VLDB 2015, Kohala Coast, Hawai‘i, USA

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing
Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, Yin Ye
In Proceedings of the 2015 ACM SIGMOD Conference on Management of Data, Melbourne, Australia

TEGRA: Table Extraction by Global Record Alignment
Xu Chu, Yeye He, Kaushik Chakrabarti, Kris Ganjam
In Proceedings of the 2015 ACM SIGMOD Conference on Management of Data, Melbourne, Australia

KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing
Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, Yin Ye
In the 41st Interntaional Confernce on Very Large Databases, VLDB 2015, Kohala Coast, Hawai‘i, USA

Discovering Denial Constraints
Xu Chu, Ihab F. Ilyas, Paolo Papotti
In the 40th Interntaional Confernce on Very Large Databases, VLDB 2014, Hangzhou, China

RuleMiner: Data Quality Rules Discovery
Xu Chu, Ihab F. Ilyas, Paolo Papotti, Yin Ye
In Proceedings of the IEEE International Conference on Data Engineering, ICDE 2014, Chicago, USA

Holistic Data Cleaning: Putting Violations into Context
Xu Chu, Ihab F. Ilyas, Paolo Papotti
In Proceedings of the IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia