-
Notifications
You must be signed in to change notification settings - Fork 10
Optimization Algorithms (old)
GA is based on the process of natural selection and the survival of the fittest. Initially, a population
SA is based on the process of slow cooldown of a system from a high to low temperature to reach a state of minimum energy. SA works by first generating an initial random solution to the optimization problem
The new solution
The performance of SA depends strongly on the cooling schedule. Currently, only the exponentially decreasing cooling schedule is available in MOF:
where
The above described SA process is repeated until the desired number of SA iterations
PSA is a form of SA that was created to address one of SAs weaknesses which is computational efficiency. It does this by utilizing several computer cores to run SA in parallel and drastically reduce the computational time. While PSA is still a form of SA, the algorithm itself has to be modified to support the use of multiple CPUs
As it is implemented into MOF, PSA works by following the above workflow. During initialization, the algorithm will create [A] random solutions. It will then store these solutions as the initial buffer and it will use these solutions to create an initial temperature. The algorithm will then run [Np] SA runs in parallel, which run at a constant temperature. The initial solution for each SA run will be selected from the buffer using a probability, and each run will iterate for [L] code evaluations. After each SA run is finished, The solutions that were found by each SA run are evaluated and the buffer is updated. The buffer is updated by adding the best solutions found by each SA run and removing poor solutions which have been outperformed while maintaining a size of [A]. It is at this time that the temperature value, [Ct], is updated. The algorithm then starts another set of [Np] SA runs in parallel with the updated temperature and buffer. This cycle repeats [T] number of times. A total number of code evaluations is T x Np x L. Once the algorithm is complete, the most optimal solution can be found in the MOF optimization track file.
The following are equations used in a PSA run
- Cost / Fitness function - used to evaluate the effectiveness of a given solution. For the sample problems, a fitness function that is determined by the cycle length is used.
- Probability function - within each SA run, the active solution may be selected with a probability. The function that defines this probability is the probability function.
- SA initial solution - When starting SA runs in parallel, each run is able to select an initial solution from the buffer using a probability. The probability is determined using the following function.
- Initial Temperature - At the start of a PSA run, the initial temperature is determined using the following function.
- LAM temperature update - After a cycle of parallel SA runs, the temperature is updated using the LAM cooling schedule which uses the following functions.
"LAM cooling schedule" [1].
PSA as any meta-heuristic optimization algorithm has various hyperparameters that need to be selected by the user. In the MOF implementation of PSA these are the main ones:
-
Number of single processor SA iterations (L) - Determines the number of iterations of SA each CPU will run in a single generation. The value of this can be changed in the yaml file for psa problems through the variable "optimization → population_size".
-
Number of temperature iterations (T) - Determines the number of SA cycles that will be initialized. The value of this can be changed in the yaml file for psa problems through the variable "optimization → number_of_generations".
-
Buffer size - Determines how many solutions will be stored in the solution buffer. The value of this can be changed in the yaml file for psa problems through the variable "optimization → buffer_length".
-
initialization parameter (a) - A parameter used to calculate the starting temperature value when starting a psa run. The value of this can be changed in the InitialTemp function in the simulatedAnnealing.py file.
- quality factor (qualityfactor) - A parameter used to adjust the cooling rate of the temperature where decreases to this value slow the cooling rate, and vice versa. The value of this can be changed in the LAM function in the simulatedAnnealing.py file.
RL techniques are framed within an environment that follows a Markov Decision Process (MDP), where an agent learns how to interact with the environment through trial and error across a notion of time hereafter called steps. At step
PPO is an on-policy technique that optimizes directly the policy function
SAC is an off-policy algorithm that tries to incorporate the benefits of both DQN and PPO, where the former have a better sample efficiency since they can use past experiences, while the later are more stable and have a better convergence. SAC modifies the typical RL objective function of maximizing future rewards to also maximize the entropy of the policy. The entropy can be seen as a measure of randomness in the policy allowing the exploration to be embedded in the RL objective. SAC can be used for only continuous actions. In this work, the SAC implementation in stable-baselines3 is used with main hyperparameters being: the DNN architecture for the actor (
DQN is an off-policy technique that combines Q-Learning with Deep Neural Networks (DNN). In Q-learning, the goal is not to learn the policy but the Q function instead. For every policy
[1] David J. Kropaczek (2008). Concept for Multi-cycle Nuclear Fuel Optimization Based On Parallel Simulated Annealing With Mixing of States