Junior

The Junior sub agent is the first deep learning method in CurriculumAgent pipeline with the goal to mimic the actions of the greedy Tutor agent. The purpose of this agent is to fit a sequential neural network, i.e., the weights of the network, on the input data of the Grid2Op environment. After a successful training, the weights are then used for the Senior in order to warm start the Deep Reinforcement Learning approach. Accordingly, the Junior sub agent plays a vital role in the curriculum approach.

Usage

Overall, the agent is trained on the experience of the Tutor agent. For this reason, the experience output of the collect_tutor_experience is first separated into a training, validation and test set via the load_dataset(). Thereafter, the junior class Junior can be used for training and evaluating the deep learning model.

Under consideration that the parameters of the junior agent are in many cases similar, one can alternatively just run the train() method, which combines the collection and the training. Note that if the general_tutor was used with multiple action sets, one has to provide these sets for the junior as well.

Structure of the Junior Model

The Junior sub agent is based on a Tensorflow (Keras) sequential model and has the following structure:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 1000)              1222000   
                                                                 
 dense_1 (Dense)             (None, 1000)              1001000   
                                                                 
 dense_2 (Dense)             (None, 1000)              1001000   
                                                                 
 dropout (Dropout)           (None, 1000)              0         
                                                                 
 dense_3 (Dense)             (None, 1000)              1001000   
                                                                 
 dropout_1 (Dropout)         (None, 1000)              0         
                                                                 
 dense_4 (Dense)             (None, 806)               806806    
                                                                 
=================================================================
Total params: 5,031,806
Trainable params: 5,031,806
Non-trainable params: 0
_________________________________________________________________