Mike's Notes
More notes on using Markov chains. Pipi uses Markov chains.
Resources
- https://en.wikipedia.org/wiki/Bellman_equation
- https://en.wikipedia.org/wiki/Markov_decision_process
- https://en.wikipedia.org/wiki/Dynamic_programming
References
- Reference
Repository
- Home > Ajabbi Research > Library >
- Home > Handbook >
Last Updated
17/05/2025
Bellman equation
  
      "A Bellman equation, named after Richard E. Bellman, is a necessary
          condition for optimality associated with the mathematical optimization
          method known as dynamic programming. It writes the "value" of a
          decision problem at a certain point in time in terms of the payoff
          from some initial choices and the "value" of the remaining decision
          problem that results from those initial choices. This breaks a dynamic
          optimization problem into a sequence of simpler subproblems, as
          Bellman's “principle of optimality" prescribes. The equation applies
          to algebraic structures with a total ordering; for algebraic
          structures with a partial ordering, the generic Bellman's equation can
          be used.
    
    
      
    
    
      The Bellman equation was first applied to engineering control theory
          and to other topics in applied mathematics, and subsequently became an
          important tool in economic theory; though the basic concepts of
          dynamic programming are prefigured in John von Neumann and Oskar
          Morgenstern's Theory of Games and Economic Behavior and Abraham Wald's
          sequential analysis. The term "Bellman equation" usually refers to the
          dynamic programming equation (DPE) associated with discrete-time
          optimization problems. In continuous-time optimization problems, the
          analogous equation is a partial differential equation that is called
          the Hamilton–Jacobi–Bellman equation.
    
    
      In discrete time any multi-stage optimization problem can be solved
          by analyzing the appropriate Bellman equation. The appropriate Bellman
          equation can be found by introducing new state variables (state
          augmentation). However, the resulting augmented-state multi-stage
          optimization problem has a higher dimensional state space than the
          original multi-stage optimization problem - an issue that can
          potentially render the augmented problem intractable due to the “curse
          of dimensionality”. Alternatively, it has been shown that if the cost
          function of the multi-stage optimization problem satisfies a "backward
          separable" structure, then the appropriate Bellman equation can be
          found without state augmentation. ..." - Wikipedia
    
Markov decision process
Originating from operations research in the 1950s, MDPs have since gained recognition in a variety of fields, including ecology, economics, healthcare, telecommunications and reinforcement learning. Reinforcement learning utilizes the MDP framework to model the interaction between a learning agent and its environment. In this framework, the interaction is characterized by states, actions, and rewards. The MDP framework is designed to provide a simplified representation of key elements of artificial intelligence challenges. These elements encompass the understanding of cause and effect, the management of uncertainty and nondeterminism, and the pursuit of explicit goals.
The name comes from its connection to Markov chains, a concept developed by the Russian mathematician Andrey Markov. The "Markov" in "Markov decision process" refers to the underlying structure of state transitions that still follow the Markov property. The process is called a "decision process" because it involves making decisions that influence these state transitions, extending the concept of a Markov chain into the realm of decision-making under uncertainty. ..." - Wikipedia

No comments:
Post a Comment