Mike's Notes
More notes on using Markov chains. Pipi uses Markov chains.
Resources
- https://en.wikipedia.org/wiki/Bellman_equation
- https://en.wikipedia.org/wiki/Markov_decision_process
- https://en.wikipedia.org/wiki/Dynamic_programming
References
- Reference
Repository
- Home > Ajabbi Research > Library >
- Home > Handbook >
Last Updated
17/05/2025
Bellman equation
"A Bellman equation, named after Richard E. Bellman, is a necessary
condition for optimality associated with the mathematical optimization
method known as dynamic programming. It writes the "value" of a
decision problem at a certain point in time in terms of the payoff
from some initial choices and the "value" of the remaining decision
problem that results from those initial choices. This breaks a dynamic
optimization problem into a sequence of simpler subproblems, as
Bellman's “principle of optimality" prescribes. The equation applies
to algebraic structures with a total ordering; for algebraic
structures with a partial ordering, the generic Bellman's equation can
be used.
The Bellman equation was first applied to engineering control theory
and to other topics in applied mathematics, and subsequently became an
important tool in economic theory; though the basic concepts of
dynamic programming are prefigured in John von Neumann and Oskar
Morgenstern's Theory of Games and Economic Behavior and Abraham Wald's
sequential analysis. The term "Bellman equation" usually refers to the
dynamic programming equation (DPE) associated with discrete-time
optimization problems. In continuous-time optimization problems, the
analogous equation is a partial differential equation that is called
the Hamilton–Jacobi–Bellman equation.
In discrete time any multi-stage optimization problem can be solved
by analyzing the appropriate Bellman equation. The appropriate Bellman
equation can be found by introducing new state variables (state
augmentation). However, the resulting augmented-state multi-stage
optimization problem has a higher dimensional state space than the
original multi-stage optimization problem - an issue that can
potentially render the augmented problem intractable due to the “curse
of dimensionality”. Alternatively, it has been shown that if the cost
function of the multi-stage optimization problem satisfies a "backward
separable" structure, then the appropriate Bellman equation can be
found without state augmentation. ..." - Wikipedia
Markov decision process
Originating from operations research in the 1950s, MDPs have since gained recognition in a variety of fields, including ecology, economics, healthcare, telecommunications and reinforcement learning. Reinforcement learning utilizes the MDP framework to model the interaction between a learning agent and its environment. In this framework, the interaction is characterized by states, actions, and rewards. The MDP framework is designed to provide a simplified representation of key elements of artificial intelligence challenges. These elements encompass the understanding of cause and effect, the management of uncertainty and nondeterminism, and the pursuit of explicit goals.
The name comes from its connection to Markov chains, a concept developed by the Russian mathematician Andrey Markov. The "Markov" in "Markov decision process" refers to the underlying structure of state transitions that still follow the Markov property. The process is called a "decision process" because it involves making decisions that influence these state transitions, extending the concept of a Markov chain into the realm of decision-making under uncertainty. ..." - Wikipedia
No comments:
Post a Comment