Towards a Query Optimizer for Text-Centric Tasks

less than 1 minute read

by Panagiots Ipeirotis et al.

The idea of the article is to provide strategies for optimal choosing between different crawl-/query strategies (like scan, filter, ...) for text-centric tasks addressing the following trade offs:

query-based execution might miss relevant documents
scan based strategies take a lot of time

The authors introduce several use cases, and define each task as deriving tokens from a large databsae.

The most interesting points in term of our current paper projects are:

definition of the execution time (including training time into the model)
description of multiple execution strategies which are applied to the model (scan, filter, iterative set expansion, automated query generation)
query based strategies outperformed crawling based approaches for a related data classification task
the methodology of updating statistics at key points for adjusting the plan for the rest of the execution refers to reoptimization methods as described by Kabra and DeWitt [1998]

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

Towards a Query Optimizer for Text-Centric Tasks

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers