Towards a Query Optimizer for Text-Centric Tasks

less than 1 minute read

by Panagiots Ipeirotis et al.

The idea of the article is to provide strategies for optimal choosing between different crawl-/query strategies (like scan, filter, ...) for text-centric tasks addressing the following trade offs:

  • query-based execution might miss relevant documents
  • scan based strategies take a lot of time

The authors introduce several use cases, and define each task as deriving tokens from a large databsae.

The most interesting points in term of our current paper projects are:

  • definition of the execution time (including training time into the model)
  • description of multiple execution strategies which are applied to the model (scan, filter, iterative set expansion, automated query generation)
  • query based strategies outperformed crawling based approaches for a related data classification task
  • the methodology of updating statistics at key points for adjusting the plan for the rest of the execution refers to reoptimization methods as described by Kabra and DeWitt [1998]