Binary Theta-Joins using MapReduce: Efficiency Analysis and Improvements

Abstract

We deal with binary theta-joins in a MapReduce environment, and we make two contributions. First, we show that the best known algorithm to date for this problem can reach the optimal trade-off between the size of the input a reducer can receive and the incurred communication cost when the join selectivity is high. Second, when the join selectivity is low, we present improvements upon the state-of-the-art with a view to decreasing the communication cost and the maximum load a reducer can receive, taking also into account the load imbalance across the reducers.

Publication
In EDBT/ICDT 2014 Joint Conference
Ioannis Koumarelas
Ioannis Koumarelas
PhD graduate in Data Cleaning

My research interests include Data Cleaning, Artificial Intelligence, and Machine Learning.

comments powered by Disqus

Related