Danny Kuo

This is an implementation of Unity ML-Agents Package. The goal of the game is to push the white cube to the green zone. The two white cuboids are set as blocking barriers for the agent to learn to avoid them.

We set awards when pushing the cube to the green zone, then train the agent. At the beginning stage of the training, the agent knows nothing about the world, so he is kind of randomly moving around.

After training, the result seems to be very solid and consistent. I think it’s because the task that I asked the agent to do is too easy for him. So it’s easier to let him learn how to play this game.

The training result is as follows:

From TensorBoard, we can see the summary statistics:
Cumulative Reward is increasing during the training session.
I set Learning Rate decreases over time.
Entropy slowly decreases during a successful training process.
These is the training configuration:
max_steps: 5.0e5
batch_size: 128
buffer_size: 2048
beta: 1.0e2
hidden_units: 256
summary_freq: 2000
time_horizon: 64
num_layers: 2

Reference