HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

less than 1 minute read

Abouzeid, Azza, Bajda-Pawlikowski, Kamil, Abadi, Daniel, Rasin, Alexander and Silberschatz, Avi (2009). ''HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads'', VLDB'09: Proceedings of the 2009 VLDB Endowment

Many proponents of relational databases see traditional distributed databases such as key-value stores (e.g., the google database) as a step backward.

Abouzeid at el., therefore, introduce HadoopDB a hybrid approach combining the strengths of parallel databases (performance, SQL compliant) with the advantages of the hadoop MapReduce framework (ability to run in heterogeneous environments, fault tolerance).

HadoopDB combines the following technologies.

the hadoop MapReduce framework
Hive (a data warehouse infrastructure built on top of hadoop) as a translational layer
a PostgreSQL or MySQL database

Extending hadoop's InputFormat implementation allows integrating conventional databases seamlessly with the framework. A specifically designed query planner (based on Hive and HiveQL) translates database queries to MapReduce jobs that connect to tables stored as files in HDFS.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers