How to Understand Dynamic Programming in the Discrete States

dynamic programming

A dynamic programming model is a multi-stage decision model where makes a sequence of decisions to achieve optimality. AlgoMonster will introduce how to define such a problem and portray such a model in terms on

What is dynamic programming?

Dynamic programming is in which decisions are made at multiple stages to form a sequence of decisions that determines a path of activity. The decisions made at each stage affect the set of decisions at the next stage, so each decision is important. At the same time, because this choice is sequentially influenced, the overall optimum is not achieved after the current optimal decision is made at each decision point. Taking a five-year production plan as an example, the production efficiency used in the first year will impact the equipment in the second year. Generally, the higher the production efficiency, the more significant the negative impact. At this point, how to specify the production efficiency for each year so that the five-year total production is the highest is a dynamic programming problem.

There are many dynamic programming problems, but we give them some basic concepts to portray the problem, such as phase, state, decision, and strategy. For these ideas, let’s see what they mean.


It refers to the division of the problem into parts. Each part represents a stage, such as in a five-year plan, each year will be a stage. The variable k often represents stages. K can take the natural numbers according to the time or be other without expanding first.


It refers to the background in each stage, like a five-year plan, the amount of equipment at the beginning, the amount of equipment after one year. When a specific state is determined, the development after the event is independent of the previous state, which is an essential property of the dynamic programming problem – Markovianity. What does it mean? For example, in the third year of the five-year plan, the known state is that 60% of the equipment is currently available, so how to get to 60% before the third year no longer affects the subsequent development?


It refers to a decision that can be made when a state is at a certain stage. This decision affects the state, and when a decision is made in a certain state, it determines the next step. For example, in a five-year plan, if the state of 60% of the equipment is available in the third year, and the next decision is “produce at a rate of 0.9 in the next year”. Assuming that the wear rate of equipment is 0.9, then the decision of 54% of the equipment available in the fourth year will be decided simultaneously.

Attainable state set

When located in a specific state, there will be a series of decisions to choose from. These decisions constitute a set called “allowable decision set,” deciding to reach a state, where all decisions can get the state set is called “reachable state set.”

Optimal strategy

To complete the five-year plan, you have to make a yield decision each year. The five-year sequence of five decisions is the strategy. Strategy is a sequential decision composition consisting of a series of decisions, generally from the beginning to the end. However, it is possible to start not from the beginning but a specific state at a particular stage. And the sequence of decisions from this state to the end is a sub-strategy. All strategies constitute a “set of allowed strategies,” which can achieve the optimal effect is the “optimal strategy.”

Indicator function to measure

The optimal strategy can achieve the optimal effect, then the impact of how to measure the merit? We need a quantitative indicator, which is the indicator function. An indicator function is a quantitative function, defined as a quantitative function in the whole process and all sub-processes. For example, the indicator function of the five-year plan is the final production. The optimal value function is the optimal effect that the indicator function can achieve, the maximum or the minimum.

Since a state from a particular stage, when making a specific decision, can determine the next step of the state, then the current state, decision, the next state, there must be some definite connection.

Go to solve dynamic programming

Now that the basic concepts are clear, it is time to get down to the business of solving dynamic programming problems. A typical dynamic programming problem is the phased shortest path problem. When we see a map like this, how do we look at the most straightforward path?

This map brings us to the basic principle of dynamic programming problems. If the optimal policy contains a state, then the optimal policy must include the optimal sub-strategy for that state. It sounds a bit abstract, but it is well understood when combined with the above diagram. In other words, the local optimum must compose the global optimum.


Using this can obtain the backpropagation method, one of the methods. It operates by finding all states of the stage and finding optimal strategies for these states on the stage from backward to forward. We can do it in the following way.

The last stage G (7) leaves out, and we start with the 6th stage F. F has two states, F1 and F2, and the optimal strategy from F1 to G is F1→G with a distance of 4; the optimal design from F2 to G is F2→G with a length of 3. Note that both strategies are here because they are the optimal sub-strategies of the two states, respectively. And this strategy cannot be removed just because F1→G is farther. The method cannot be out just because F1→G is further away.

Then move to the fifth stage, E, from E1 there are two roads E1 → F1 → G (7) and E1 → F2 → G (8), the first one is better, at this time, the optimal strategy for E1 is E1 → F1 → G (7). Similarly, E2 has two roads, respectively E2 → F1 → G (9) and E2 → F2 → G (5), also retain the optimal strategy E2 → F2 → G (5). For E3, the optimal approach is E2 → F2 → G (9).

The key lies in the next stage D. To find the optimal strategy from D1 to G, do we still need to try D1→E1→F1→G one by one and try all the paths once? It is unnecessary because, after D1 to E1, the optimal sub-strategy behind E1 is already available. We need to compare D1→E1→G (9) and D1→E2→G (7), and we can ignore the decision behind E. The same is true for D2 and D3.

The solving process of dynamic programming

At this point, probably the operation of the backpropagation method is also understood. It is to find the optimal sub-strategy of a state and then backpropagation of the previous state with its set of reachable states on it. Now, we use the previous basic concept to portray this dynamic programming problem.

Stages, defined here as the major classes of locations A, B, C, D, E, F, and G, are represented by stages 1 to 7, respectively.

The state, defined here as the specific location, for example, at stage 2 (B), is located at B2, the state is B2 at this time, and the set of reachable states of B2 is naturally C2, C3, C4.

A decision, defined here as a choice made at a particular stage in a specific state (e.g., B2), such as the choice to go from B2 to C2, is a decision. Allowing the set of decisions is the set of three decisions B2→C2, B2→C3, B2→C4.

Strategy, defined here as the set of all decisions sequential from the starting point A to the endpoint G, such as the sequence composed of six decisions A→B1, B1→C2, C2→D2, D2→E3, E3→F2, F2→G, also written as A→B1→C2→D2→E3→F2→G.

The indicator function, it defined here as the total distance traveled by a particular strategy.

Optimal function, it defined here as the minimum of all indicator functions.

The state transition equation, it defined here as the choice of a certain path to the next state when in a certain state.

In general, some optimal sub-functions can compose the recurrence relation. And this recurrence relation is known as the fundamental equation of dynamic programming. With the inverse order solution, it is possible to explore the sequential solution. Like, find the optimal strategy from stage 1 to stage 2, then 3, 4, 5, and more.

Leave a Reply

Your email address will not be published.