HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

less than 1 minute read

Abouzeid, Azza, Bajda-Pawlikowski, Kamil, Abadi, Daniel, Rasin, Alexander and Silberschatz, Avi (2009). ''HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads'', VLDB'09: Proceedings of the 2009 VLDB Endowment

Many proponents of relational databases see traditional distributed databases such as key-value stores (e.g., the google database) as a step backward.

Abouzeid at el., therefore, introduce HadoopDB a hybrid approach combining the strengths of parallel databases (performance, SQL compliant) with the advantages of the hadoop MapReduce framework (ability to run in heterogeneous environments, fault tolerance).

HadoopDB combines the following technologies.

  • the hadoop MapReduce framework
  • Hive (a data warehouse infrastructure built on top of hadoop) as a translational layer
  • a PostgreSQL or MySQL database
Extending hadoop's InputFormat implementation allows integrating conventional databases seamlessly with the framework. A specifically designed query planner (based on Hive and HiveQL) translates database queries to MapReduce jobs that connect to tables stored as files in HDFS.