Deep Reinforcement Learning

Page 1 of 2

History of Deep Reinforcement Learning

Academic researchers have developed deep reinforcement learning for about 60 years:

In 1956, Richard Bellman proposed Dynamic Programming Equation, which later was used to update the Q-table.

Dynamic Programming Equation is an essential condition for optimality pertaining to the mathematical optimization way. It defines the value of a decision problem at a specific point regarding the payoff situation and the value of the remaining decision issue that comes from the first choices.

[Application]

The well-known approach in this equation is intertemporal capital asset pricing model by Robert C. Merton. In the solution to Merton's theoretical model, investors chose between income today and future income or capital gains, using a form of Bellman's equation.
Nancy Stokey, Robert E. Lucas, and Edward Prescott identified stochastic and non-stochastic dynamic programming in critical details, and this led to various applications in economics, such as optimal economic growth, resource extraction, principal–agent problems, public finance, business investment, asset pricing, factor supply, and industrial organization.

In 1970s, Harry Klopf wrote reports to propose that a system which could learn but not just memorize given examples as Supervised Learning did, was needed.
In 2014, David Silver proposed Policy gradient methods. This new technique solved problems occurred in traditional approaches. For example, continuous states and actions could be dealt with in ways that were not complex.
In 2016, Richard Sutton and Andrew Barto released the drafted second edition of Reinforcement Learning: An Introduction.

Applications of Deep Reinforcement Learning

Reinforcement Learning has been used to solve problems on a long-term versus short-term reward trade-off. This framework is applied for various real cases, such as robot control, robotics, inventory operation, resource allocation, finance, and so on.

Robot Control: iterative learning control[1] is applied in robotic systems, using model of dynamics, correct errors in trajectories. But most industrial robotic systems still perform a fixed motion repeatedly with simple or no perception.

([1] Bristow, Douglas, Marina Tharayil, and Andrew G. Alleyne. A survey of iterative learning control)

* image source: Marketsandmarkets

Robotics: reinforcement learning framework is applied in torque at joins of robots that are supposed to move as human or animals do. The system reads observations in each action of the subjects by various sensors and it rewards right decisions by navigating to target an accurate location.

[pic 1]

* image source: Boston Dynamics

Inventory Management: for manufacturing companies, inventory control system is very critical. Reinforcement Learning contributes to clarify how many inventories a company purchases depending on different inventory status.
Resource Allocation: in a call center, the controlling manager is in charge of how to allocate the human resources, who are call operators, efficiently. RL helps the whole operation to allocate the right call counselors on the basis of ‘who to service first’.
Finance: RL is being applied in capital market everyday. Especially for investment decision, portfolio design, and option/asset pricing, the typical framework on RL observes the real market and gives rewards on different trade-offs on investment.

Download as (for upgraded members) txt (3.4 Kb) pdf (286.2 Kb) docx (1.9 Mb)

Continue for 1 more page »

Read full document Save

Essay Preview

prev next

By: graghav

Submitted: November 28, 2017

Essay Length: 496 Words / 2 Pages

Paper type: Business Plan

Report this essay

Related Essays

Learning from Helen Keller

LEARNING FROM HELEN KELLER Brandon Facilitated Communication Institute Helen Keller is probably the most universally recognized disabled person of the twentieth century. (Others such as

3,877 Words | 16 Pages
Decoupling Reinforcement Learning from Voice-Over-Ip in Superblocks

Decoupling Reinforcement Learning from Voice-over-IP in Superblocks Abstract Many cyberneticists would agree that, had it not been for Smalltalk, the investigation of A* search might

2,231 Words | 9 Pages
Opions on Mortality and Were We Learn Them

Philosophy What forms your opinions, your morals, and your reasoning for your actions? You may say your parents shape you into what you will be

436 Words | 2 Pages
How Do Schedules of Reinforcment Affect Learning?

Skinner discovered schedules of reinforcement. Our book defines a schedule of reinforcement as “a specific pattern of presentation of reinforcers over time”. Rather than

383 Words | 2 Pages