ONLINE COURSE ON REINFORCEMENT LEARNING (3:0)
CCE-PROFICIENCE MAY – JULY 2026
Duration
3 months May -July 2026
Schedule
Every Saturday
Saturdays 2 P.M. to 5:30 P.M. with 30 minutes break in between.
Course offered
Online
Exam Duration
31 July to 9 August 2026
Classes Start
~4 May 2026
Objectives of the Course
Reinforcement learning refers to a class of techniques that combine aspects of optimal control, simulation/data driven optimization, and approximation methods for problems of dynamic decision making under uncertainty when the model of the underlying system and its processes is unknown. A large portion of the algorithms and techniques used here are model-free in nature and as a result need no knowledge of the system dynamics and protocols used. Reinforcement Learning thus finds applications in several diverse areas such as Adaptive Control, Signal Processing, Manufacturing, Communication and Wireless Networks, Autonomous Systems and Data Mining. The objective of this course will be to provide a strong foundation in Reinforcement Learning through the various tools, techniques and algorithms used as well as to cover the state-of-the-art algorithms in Deep Reinforcement Learning involving simulation-based neural network methods.
Syllabus
Introduction to Reinforcement Learning, Multi-armed bandits, Markov decision processes, Dynamic Programming Value and Policy Iteration Methods, Model-Free Learning Approaches, Monte-Carlo Methods, Temporal Difference Learning, Q-learning, SARSA, Double Q-learning, Value Function Approximation Methods TD Learning with Linear Function Approximation, Neural Network Architectures, Deep Q-Network Algorithm, Policy Gradient Methods, ActorCritic Algorithms.
Minimum Qualification required by the candidates
B.Tech (any discipline); B.Sc in Mathematics / Statistics /Computer Science / Physics / Data Science.
Course Plan for the Reinforcement Learning:
Week 1 Introduction to Reinforcement Learning – examples and applications
Week 2 Multi-armed Bandits – action selection strategies
Week 3 Multi-armed Bandits – algorithms; Introduction to Markov Decision Processes
Week 4 Markov Decision Processes – Examples, formulations
Week 5 Numerical approaches for Markov Decision Processes
Week 6 Monte-Carlo model-free Reinforcement Learning Algorithms for prediction
Week 7 Monte-Carlo Algorithms for Control; Temporal Difference Methods
Week 8 One and n-Step Temporal Difference Learning, Q-learning, SARSA, Expected SARSA, Double Q-learning
Week 9 Function Approximation Methods, TD Learning/SARSA with Linear Function Approximation
Week 10 Neural network architectures, Deep Q-learning
Week 11 Introduction to policy gradient methods – basic principles and results
Week 12 Policy gradient algorithms – REINFORCE, Actor-Critic
Reference Books
1. R.Sutton and A.Barto, Reinforcement Learning, 2018 (MIT Press)
2. Recent papers (to be shared in class)
Know The Facilitators

Course Fee
| Particulars | Amount |
| Course Fee | 15,000 |
| Application Fee | 300 |
| GST@18% | 2,754 |
| Total | 18,054 |

