Cooperative Multi-Agent Reinforcement Learning for Tennis

This report presents the implementation of Cooperative Multi-Agent Deep Reinforcement Learning using the MADDPG algorithm in a virtual tennis environment. The objective was to train two agents to work together and keep a ball in play over a net. The agents controlled rackets and received rewards for successful hits and penalties for misses. Through the utilization of neural networks and advanced techniques such as experience replay and target networks, the agents were able to achieve the desired average score, demonstrating the effectiveness of MADDPG in training agents to collaborate in complex environments.

Category:

Sub-category:

Machine Learning

Reinforcement Learning

Overview:

This report explores the implementation of Cooperative Multi-Agent Deep Reinforcement Learning in a virtual tennis environment. The objective was to train two agents to work together and keep a ball in play over a net using the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm.

Description:

The agents controlled rackets and received rewards for successfully hitting the ball over the net (+0.1) and penalties for letting the ball hit the ground or going out of bounds (-0.01). The goal was to achieve an average score of +0.5 over 100 consecutive episodes, indicating task completion.

The MADDPG algorithm, an extension of Deep Deterministic Policy Gradient (DDPG) for multi-agent scenarios, was utilized. Each agent had its own actor and critic network. The actor network mapped states to actions with fully connected layers (100 and 50 neurons), while the critic network estimated Q-values using fully connected layers (300 neurons each).

To enhance stability, an experience replay buffer was implemented to enable learning from past experiences. A separate target network was also utilized to introduce weight update delays, mitigating the impact of correlated experiences.

After approximately 4839 episodes, the agents successfully solved the tennis environment. This achievement demonstrates the efficacy of MADDPG in training multiple agents to collaborate in high-dimensional state spaces.