Optimal Stopping Reloaded
On the use of hybrid reinforcement learning for autonomic resource allocation *** by Teasuro et. al
This paper elaborates on the use of reinforcement learning (RL) for automatic resource allocation. Again the paper addresses the problem of translating high-level objectives into system actions, using a utility model, allowing systems to dynamically reconfigure themselves, optimize their performance, detect and repair faults, etc. [teasuro2007].
The paper provides the following contributions:
- it discusses the difficulty of acquiring sufficient domain knowledge, required to create optimized strategies - the so called knowledge bottleneck.
- lack of knowledge yields to poor performance during live online training conducted by reinforced training approaches - the paper addresses this problem by introducing a hybrid approach (the system is controlled using a fixed policy $$p_i$$, until the reinforced training approach learned a better conducting policy $$p_i'$$. This step of online learning and replacing strategies might even be iterated over time.
- The show that their hybrid RL approach is promising for systems with:
- tractable state-space representation
- frequent online decision making depending upon time-varying system states
- frequent observations of numerical rewards
- pre-existing policies that obtain acceptable (but imperfect) performance levels
Addons:
- reinforced learning uses trial-and-error methods to learn the value function $$Q_{p_i}(s,a)$$ adjusting a value to action a performed in state s. It has the following main advantages: (i) it does not need an explicit model of the domain, and (ii) it is grounded in Markov decision processes (mdp) theory which is fundamentally a sequential decision theory.
- Temporal Difference learning and related methods, combined with Bellman's policy improvement theorem shows that the policy will converge under stated conditions.
[tesauro2007] Tesauro, Gerald, Jong, Nicholas K., Das, Rajarshi and Bennani, Mohamed N. (2007). ''On the use of hybrid reinforcement learning for autonomic resource allocation'', Cluster Computing, Kluwer Academic Publishers, pages 287--299, 10(3)
Using economic models to allocate resources in database management systems *** by Zhang et al.
This paper elaborates on a economic model to allocate multiple resources such as memory buffer space and CPU shares to workloads running on a DBMS. The authors apply a utility model to model the workload's utility based on business importance, reducing the potential complexity of the resource allocation problem.
The authors provide a good overview of related literature using concepts from microeconomics (Davison et al.) and economic models (Boughton et al.), and demonstrate how a broker based trade mechanism can be applied to this problem class.
- broker and consumers try to achieve their goals (maximize utility)
- consumer (=workload) wealth is assigned in accordance to the workload's importance. (more important workloads are wealthier and are therefore more likely to win auctions)
- the use an easy model to capture the additional utility gained by another unit of a resource. (resource res_j will be retrieved as long as $$du/dres_i \gt du/dres_j$$ for any $$j \ne i$$) -> the model therefore manages to break down the very complex concept of utility to an easy representation
- performance models show how CPU and memory buffer's are distributed
- utility is given as a fraction of the maximum throughput
[zhang2008] Zhang, Mingyi, Martin, Patrick, Powley, Wendy and Bird, Paul (2008). ''Using economic models to allocate resources in database management systems'', CASCON '08: Proceedings of the 2008 conference of the center for advanced studies on collaborative research, ACM, pages 248--259