Teaching material for the course "Machine and Reinforcement Learning in Control Applications"
University of Rome "Tor Vergata", academic year 2021/2022
- Corrado Possieri, corrado.possieri@uniroma2.it
- Alessandro Tenaglia, alessandro.tenaglia@uniroma2.it
DynamicProgramming/Src/contains the code to implement Dynamic Programming algorithms;PolicyIter.mis a class that implements the Policy Iteration algorithm;ValueIter.mis a class that implements the Value Iteration algorithm;
Formula1/Srccontains the code to solve F1 problem;f1_dp.mis a script that solves the F1 problem with Dynamic Programming algorithms;f1_main.mis a script that shows the F1 track;f1_mc.mis a script that solves the F1 problem with Monte Carlo methods;f1_mdp.mis a script that defines the F1 problem as a MDP;f1_track.mis a script that generates a Grid World from an image of a F1 track;
JacksCarRental/Srccontains the code to solve the Jack's car rental problem;JCR.mis a class that implements the Jack's car rental problem;jcr_dp.mis a class that solves the Jack's car rental problem with Dynamic Programming algorithms;jcr_mdp.mis a script that defines the Jack's car rental problem as a MDP;
MonteCarlo/Srccontains the code to implement Monte Carlo methods;MonteCarlois a class that implements Monte Carlo methods;
MultiArmBandit/Srccontains the code to solve the multi-armed bandit problem;Bandit.mis a class that implements a multi-armed bandit in different scenarios;Policy.mis an abstract class that defines a template for sample-average policies;EpsGreedy/contains the code of ε-greedy policy;EpsGreedy.mis a class that implements the ε-greedy policy;eps_run.mis a script that shows the behavior of the ε-greedy policy;eps_main.mis a script that compares the ε-greedy policy in different scenarios;
UpConfBound/contains the code of upper confidence bound policy;UpConfBound.mis a class that implements the upper confidence bound policy;ucb_run.mis a script that shows the behavior of the upper confidence bound policy;ucb_main.mis a script that compares the upper confidence bound policy in different scenarios;
PrefUp/contains the code of preference updates policy;PrefUp.mis a class that implements the preference updates policy;pref_run.mis a script that shows the behavior of the preference updates policy;pref_main.mis a script that compares the preference updates policy in different scenarios;
MyGridWorld/Srccontains a custom implementation of a Grid World;MyGridWorldis a class that implement the Grid World;mygw_dp.mis a script that solves the Grid World problem with Dynamic Programming algorithms;mygw_main.mis a script that shows the Grid World;mygw_mc.mis a script that solves the Grid World problem with Monte Carlo methods;mygw_mdp.mis a script that defines the Grid World problem as a MDP;mygw_td.mis a script that solves the Grid World problem with Temporal Difference methods;
TemporalDifference/Srccontains the code to implement Temporal Difference methods;TempDiffis a class that implements Temporal Difference methods: SARSA, ESARSA, QL, DQL;