This project showcases multiple traditional and Deep RL approaches to creating an artificial agent that plays Onitama proficiently. The Minimax Depth 5 and above models were found to be unbeaten by any human that competed against it.
The following algorithms were Implemented:
- Minimax with Alpha-Beta Pruning (single and multi-processor versions)
- Monte-Carlo Tree Search
- Deep Deterministic Policy Gradient
- Deep Double Dueling Q Networks
The full details and investigation of the implementations of the algorithms 1 & 2 can be found in
TraditionalAI_Report.pdf and the details for algorithms 3 & 4 can be found in
To set up, clone this repository and have a Python 3.6.9 environment with the following libraries installed:
torch==1.8.1 numpy==1.18.5 seaborn==0.11.1 matplotlib==3.3.3 tqdm==4.59.0 torchsummary==1.5.1
This board game is set up as a Markov Decision Process (MDP).
Traditional AI Gameplay
To play against the Minimax and MCTS agents, run the
play_onitama function form
play_onitama(game_mode, first_move = None, verbose = 1, minimax_depth = None, minimax_depth_red = None, minimax_depth_blue = None, aivai_turns = 500, timeLimit=None, iterationLimit=None, iteration_red = None, iteration_blue = None, mcts_efficiency = "space", parallel = None)
game_mode controls the type of the two players. It can be a minimax vs minimax or a player vs MCTS and so on. The strengths of the AIs can be set in the other parameters.
Deep RL Training
To train the Deep RL agents, go to
Train.py and adjust the parameters in the
onitama_deeprl_train function. An instance of the function is as shown below:
onitama_deeprl_train("train", "DDPG", 10000, "insert_training_name", "minimax", 1, discount_rate = 0.99, lr_actor = 0.001, lr_critic = 0.001, tau = 0.005, board_input_shape = [10, 5, 5], card_input_shape = 10, num_actions = 40, max_mem_size = 1000000, batch_size = 128, epsilon = 1, epsilon_min = 0.01, update_target = None, val_constant = 10, invalid_penalty = 500, hand_of_god = True, use_competing_AI_replay = False, win_loss_mem_size = 1000, desired_win_ratio = 0.6, use_hardcoded_cards = True, reward_mode = "simple_reward", minimax_boost = 1, mcts_boost = 5000, plot_every = 1000, win_loss_queue_size = 100, architecture = "actions_only", moving_average = 50, verbose = False, valid_rate_freeze = 0.95)
The algorithm can be set as either
D3QN. The competing agent used to train against the agent can be either
MCTS. The Minimax or MCTS competing agent will boost its strength once in the last
win_loss_queue_size episodes, the DeepRL agent wins at a rate higher than
architecture of the neural network (refer to
DeepRL_Report.pdf for the details) can be set as well.
One example of a neural network architecture used is the following, where the validity and actions branch separately and multiply to the final output layer. More details can be found in