Dremel: Interactive Analysis of Web-Scale Datasets

less than 1 minute read

by Melnik et al. in Proceedings of the 36th International Conference on Very Large Data Bases 2010

This paper covers Dremel, a scalable, interactive ad-hoc query system for the analysis of read-only nested data.

Design characteristics:

  1. Demel operates in situ which refers to its ability to access and process data directly on the node by using a storage layer such as the Google File System or BigTable.
  2. Analogous to distributed DBMS, Demel executes queries by means of a serving tree that (i) recursively rewrites queries and pushes them down to lower nodes, and (ii) aggregates the nodes' answers to the query result.
  3. Provides an SQL like query interface.
  4. Uses a column-striped storage representation by translating structured (or nested) data such as XML to this representation.
The article also discusses algorithms for converting XML data into this storage format, Demel's query language and the way queries are executed, and experiments concerning its scalability and execution time.