CSCA 5912: Deep Reinforcement Learning: From Theory to Practice
ÌýÌýPreview this courseÌýin the non-credit experience today!Ìý
Start working toward program admission and requirements right away.ÌýWork you complete in the non-credit experience will transfer to the for-credit experience when you upgrade and pay tuition. See How It Works for details.
Course Type: MS-AI Breadth, MS-CS Elective
Specialization: Reinforcement Learning
Instructor:ÌýDr. Ashutosh Trivedi, Associate Professor of Computer Science
Prior knowledge needed: TBD
Learning Outcomes
- Explain how neural-network-based function approximation extends reinforcement learning beyond finite tabular settings.
- Implement and evaluate value-based deep reinforcement learning algorithms, including Deep Q-Networks and stabilizing techniques such as replay buffers and target networks.
- Derive and implement policy-gradient methods, including REINFORCE, baselines, and advantage-based updates.
- Explain and analyze actor-critic methods that combine policy optimization with value estimation.
- Compare deep reinforcement learning algorithms in terms of stability, scalability, sample efficiency, and suitability for different decision-making tasks. Ìý Ìý
Course Grading Policy
| Assessment | Percentage of Grade | AI Usage Policy |
|---|---|---|
| Quizzes (5) | 50% (10% each) | Conditional |
| Final Exam | 50% | Conditional |
Course Content
Duration: TBD
This module introduces function approximation as the transition point from tabular reinforcement learning to deep reinforcement learning. The central message is that deep RL is not merely supervised learning applied to RL data: the targets are noisy, bootstrapped, policy-dependent, and often moving as the parameters change.Ìý
Duration: TBD
This module develops value-based deep reinforcement learning as bootstrapped regression. Learners study fitted value iteration, understand why approximation can break contraction arguments, and then study DQN and its stabilizers: replay buffers, target networks, double DQN, dueling networks, and prioritized replay.Ìý
Duration: TBD
This module introduces direct policy optimization. The main idea is to optimize a parameterized policy by estimating the gradient of expected return from sampled trajectories. Learners derive the likelihood-ratio estimator, understand causality and baselines, implement REINFORCE, and then move to actor-critic methods.Ìý
Duration: TBD
This module surveys modern deep reinforcement learning algorithms through the lens of stability, exploration, and continuous control. Learners study PPO as a conservative policy-gradient method, DDPG as deterministic actor-critic for continuous control, and SAC as entropy regularized actor-critic.Ìý
Duration: TBD
This module has two lessons. The first focuses on stable policy updates through trust-region ideas and PPO clipping. The second focuses on continuous control and entropy-regularized learning through DDPG and SAC.
Duration/description: TBD
Notes
- Cross-listed Courses: CoursesÌýthat are offered under two or more programs. Considered equivalent when evaluating progress toward degree requirements. You may not earn credit for more than one version of a cross-listed course.
- Page Updates: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click theÌýView on CourseraÌýbuttonÌýabove for the most up-to-date information.