Reinforcement Learning

less than 1 minute read

A reinforcement learning approach to dynamic resource allocation ** by David Vengerov

Another paper on the application of the concept of 'utility' to computer science. The author states that the central issue of this kind of research is predicting the future utility for a given allocation. Therefore, the paper suggests the use of reinforcement learning to learn utility functions.

Problem The paper elaborates on whether a resource should be migrated between projects. That's exactly then the case, if $$du_i/dres_i \gt du_j/dres_j$$. Trading of an infinitely divisible resource stops at $$du_i/dres_i = du_j/dres_j$$ (if all marginal benefits are equal). This condition is sufficient for concave increasing utility functions.

Remarks

  • the autor uses a sum function to summarize utility (with \lambda=1).
  • the paper outlines one of the main problems of rule based approaches (an exponential increase of rules as the number of inputs/resources increases)
  • a reinforcement learning algorithm is presented (containing states, actions, a reward function, and a state transition function)
  • in the experiments the utility-based policy outperformed its contenders by 26%.

[vengerov2007] Vengerov, David (2007). ''A reinforcement learning approach to dynamic resource allocation'', Eng. Appl. Artif. Intell., Pergamon Press, Inc., pages 383--390, 20(3)