Dremel: Interactive Analysis of Web-Scale Datasets
by Melnik et al. in Proceedings of the 36th International Conference on Very Large Data Bases 2010
This paper covers Dremel, a scalable, interactive ad-hoc query system for the analysis of read-only nested data.
Design characteristics:
- Demel operates in situ which refers to its ability to access and process data directly on the node by using a storage layer such as the Google File System or BigTable.
- Analogous to distributed DBMS, Demel executes queries by means of a serving tree that (i) recursively rewrites queries and pushes them down to lower nodes, and (ii) aggregates the nodes' answers to the query result.
- Provides an SQL like query interface.
- Uses a column-striped storage representation by translating structured (or nested) data such as XML to this representation.