Optimal Stopping Reloaded

2 minute read

On the use of hybrid reinforcement learning for autonomic resource allocation *** by Teasuro et. al

This paper elaborates on the use of reinforcement learning (RL) for automatic resource allocation. Again the paper addresses the problem of translating high-level objectives into system actions, using a utility model, allowing systems to dynamically reconfigure themselves, optimize their performance, detect and repair faults, etc. [teasuro2007].

The paper provides the following contributions:

it discusses the difficulty of acquiring sufficient domain knowledge, required to create optimized strategies - the so called knowledge bottleneck.
lack of knowledge yields to poor performance during live online training conducted by reinforced training approaches - the paper addresses this problem by introducing a hybrid approach (the system is controlled using a fixed policy $$p_i$$, until the reinforced training approach learned a better conducting policy $$p_i'$$. This step of online learning and replacing strategies might even be iterated over time.
The show that their hybrid RL approach is promising for systems with:
- tractable state-space representation
- frequent online decision making depending upon time-varying system states
- frequent observations of numerical rewards
- pre-existing policies that obtain acceptable (but imperfect) performance levels

Addons:

reinforced learning uses trial-and-error methods to learn the value function $$Q_{p_i}(s,a)$$ adjusting a value to action a performed in state s. It has the following main advantages: (i) it does not need an explicit model of the domain, and (ii) it is grounded in Markov decision processes (mdp) theory which is fundamentally a sequential decision theory.

Temporal Difference learning and related methods, combined with Bellman's policy improvement theorem shows that the policy will converge under stated conditions.

[tesauro2007] Tesauro, Gerald, Jong, Nicholas K., Das, Rajarshi and Bennani, Mohamed N. (2007). ''On the use of hybrid reinforcement learning for autonomic resource allocation'', Cluster Computing, Kluwer Academic Publishers, pages 287--299, 10(3)

Using economic models to allocate resources in database management systems *** by Zhang et al.

This paper elaborates on a economic model to allocate multiple resources such as memory buffer space and CPU shares to workloads running on a DBMS. The authors apply a utility model to model the workload's utility based on business importance, reducing the potential complexity of the resource allocation problem.

The authors provide a good overview of related literature using concepts from microeconomics (Davison et al.) and economic models (Boughton et al.), and demonstrate how a broker based trade mechanism can be applied to this problem class.

broker and consumers try to achieve their goals (maximize utility)
consumer (=workload) wealth is assigned in accordance to the workload's importance. (more important workloads are wealthier and are therefore more likely to win auctions)

the use an easy model to capture the additional utility gained by another unit of a resource. (resource res_j will be retrieved as long as $$du/dres_i \gt du/dres_j$$ for any $$j \ne i$$) -> the model therefore manages to break down the very complex concept of utility to an easy representation

performance models show how CPU and memory buffer's are distributed
utility is given as a fraction of the maximum throughput

In conclusion the paper show that high-level business importance policies can be translated into resource tuning actions for a DBMS.

[zhang2008] Zhang, Mingyi, Martin, Patrick, Powley, Wendy and Bird, Paul (2008). ''Using economic models to allocate resources in database management systems'', CASCON '08: Proceedings of the 2008 conference of the center for advanced studies on collaborative research, ACM, pages 248--259

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

Optimal Stopping Reloaded

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers