ONLINE COURSE ON REINFORCEMENT LEARNING (3:0)

CCE-PROFICIENCE MAY – JULY 2026

Duration

3 months May -July 2026

Schedule

Every Saturday

Saturdays 2 P.M. to 5:30 P.M. with 30 minutes break in between.

Course offered

Online

Exam Duration

31 July to 9 August 2026

Classes Start

~4 May 2026

Objectives of the Course

Reinforcement learning refers to a class of techniques that combine aspects of optimal control, simulation/data driven optimization, and approximation methods for problems of dynamic decision making under uncertainty when the model of the underlying system and its processes is unknown. A large portion of the algorithms and techniques used here are model-free in nature and as a result need no knowledge of the system dynamics and protocols used. Reinforcement Learning thus finds applications in several diverse areas such as Adaptive Control, Signal Processing, Manufacturing, Communication and Wireless Networks, Autonomous Systems and Data Mining. The objective of this course will be to provide a strong foundation in Reinforcement Learning through the various tools, techniques and algorithms used as well as to cover the state-of-the-art algorithms in Deep Reinforcement Learning involving simulation-based neural network methods.

Syllabus

Introduction to Reinforcement Learning, Multi-armed bandits, Markov decision processes, Dynamic Programming Value and Policy Iteration Methods, Model-Free Learning Approaches, Monte-Carlo Methods, Temporal Difference Learning, Q-learning, SARSA, Double Q-learning, Value Function Approximation Methods TD Learning with Linear Function Approximation, Neural Network Architectures, Deep Q-Network Algorithm, Policy Gradient Methods, ActorCritic Algorithms.

Minimum Qualification required by the candidates

B.Tech (any discipline); B.Sc in Mathematics / Statistics /Computer Science / Physics / Data Science.

Course Plan for the Reinforcement Learning:

Week 1 Introduction to Reinforcement Learning – examples and applications
Week 2 Multi-armed Bandits – action selection strategies
Week 3 Multi-armed Bandits – algorithms; Introduction to Markov Decision Processes
Week 4 Markov Decision Processes – Examples, formulations
Week 5 Numerical approaches for Markov Decision Processes
Week 6 Monte-Carlo model-free Reinforcement Learning Algorithms for prediction
Week 7 Monte-Carlo Algorithms for Control; Temporal Difference Methods
Week 8 One and n-Step Temporal Difference Learning, Q-learning, SARSA, Expected SARSA, Double Q-learning
Week 9 Function Approximation Methods, TD Learning/SARSA with Linear Function Approximation
Week 10 Neural network architectures, Deep Q-learning
Week 11 Introduction to policy gradient methods – basic principles and results
Week 12 Policy gradient algorithms – REINFORCE, Actor-Critic

Reference Books

1. R.Sutton and A.Barto, Reinforcement Learning, 2018 (MIT Press)
2. Recent papers (to be shared in class)

Know The Facilitators

Shalabh Bhatnagar

Shalabh Bhatnagar

Professor

Dept of Computer Science and Automation,

Indian Institute of Science.

Course Fee

Particulars Amount
Course Fee 15,000
Application Fee 300
GST@18% 2,754
Total 18,054