久久综合色88_欧美激情国产日韩精品一区18_午夜精品一区二区三区在线观看 _自拍日韩亚洲一区在线

課程目錄: 基于函數逼近的預測與控制培訓
4401 人關注
(78637/99817)
課程大綱:

    基于函數逼近的預測與控制培訓

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

主站蜘蛛池模板: 亚洲一区二区三区免费观看| 欧美亚洲另类在线| 久久精品国产sm调教网站演员 | 免费在线观看一区二区| 日本一区二区在线视频观看| 国产一区二区丝袜| 美女av一区二区三区| 欧美日韩精品在线一区二区| 亚洲国产一区二区在线| 日韩av免费看网站| 国产精品毛片va一区二区三区 | 久久国产精品久久久久久久久久| 国产精品麻豆va在线播放| 亚洲综合激情五月| 国产精品大全| 美日韩精品免费视频| 亚洲熟妇无码一区二区三区| 高清国产一区| 超碰97国产在线| 久久久精品美女| 日本欧洲国产一区二区| 日韩中文字幕免费视频| 国产精品免费网站| 久久riav二区三区| 国产美女久久精品香蕉69| 久久久免费在线观看| 91免费精品视频| 中文字幕日韩精品久久| 91成人免费观看| 国产极品在线视频| 国产精品激情自拍| 天天摸天天碰天天添| 91国产美女视频| 久久精品99久久| 亚洲在线观看视频网站| 国产精品亚洲精品| 黄色片免费在线观看视频| 国产欧美日韩亚洲精品| 97国产精品久久| 国产综合免费视频| 国产精品激情自拍|