Optimal Stopping Reloaded

2 minute read

On the use of hybrid reinforcement learning for autonomic resource allocation *** by Teasuro et. al

This paper elaborates on the use of reinforcement learning (RL) for automatic resource allocation. Again the paper addresses the problem of translating high-level objectives into system actions, using a utility model, allowing systems to dynamically reconfigure themselves, optimize their performance, detect and repair faults, etc. [teasuro2007].

The paper provides the following contributions:

  • it discusses the difficulty of acquiring sufficient domain knowledge, required to create optimized strategies - the so called knowledge bottleneck.
  • lack of knowledge yields to poor performance during live online training conducted by reinforced training approaches - the paper addresses this problem by introducing a hybrid approach (the system is controlled using a fixed policy $$p_i$$, until the reinforced training approach learned a better conducting policy $$p_i'$$. This step of online learning and replacing strategies might even be iterated over time.
  • The show that their hybrid RL approach is promising for systems with:
    • tractable state-space representation
    • frequent online decision making depending upon time-varying system states
    • frequent observations of numerical rewards
    • pre-existing policies that obtain acceptable (but imperfect) performance levels


  • reinforced learning uses trial-and-error methods to learn the value function $$Q_{p_i}(s,a)$$ adjusting a value to action a performed in state s. It has the following main advantages: (i) it does not need an explicit model of the domain, and (ii) it is grounded in Markov decision processes (mdp) theory which is fundamentally a sequential decision theory.
  • Temporal Difference learning and related methods, combined with Bellman's policy improvement theorem shows that the policy will converge under stated conditions.

[tesauro2007] Tesauro, Gerald, Jong, Nicholas K., Das, Rajarshi and Bennani, Mohamed N. (2007). ''On the use of hybrid reinforcement learning for autonomic resource allocation'', Cluster Computing, Kluwer Academic Publishers, pages 287--299, 10(3)

Using economic models to allocate resources in database management systems *** by Zhang et al.

This paper elaborates on a economic model to allocate multiple resources such as memory buffer space and CPU shares to workloads running on a DBMS. The authors apply a utility model to model the workload's utility based on business importance, reducing the potential complexity of the resource allocation problem.

The authors provide a good overview of related literature using concepts from microeconomics (Davison et al.) and economic models (Boughton et al.), and demonstrate how a broker based trade mechanism can be applied to this problem class.

  • broker and consumers try to achieve their goals (maximize utility)
  • consumer (=workload) wealth is assigned in accordance to the workload's importance. (more important workloads are wealthier and are therefore more likely to win auctions)
  • the use an easy model to capture the additional utility gained by another unit of a resource. (resource res_j will be retrieved as long as $$du/dres_i \gt du/dres_j$$ for any $$j \ne i$$) -> the model therefore manages to break down the very complex concept of utility to an easy representation
  • performance models show how CPU and memory buffer's are distributed
  • utility is given as a fraction of the maximum throughput
In conclusion the paper show that high-level business importance policies can be translated into resource tuning actions for a DBMS.

[zhang2008] Zhang, Mingyi, Martin, Patrick, Powley, Wendy and Bird, Paul (2008). ''Using economic models to allocate resources in database management systems'', CASCON '08: Proceedings of the 2008 conference of the center for advanced studies on collaborative research, ACM, pages 248--259