This paper was accepted into the “Reinforcement Learning for Real Life” workshop at NeurIPS 2022.
Advances in reinforcement learning (RL) have inspired new directions in the intelligent automation of network defense. However, many of these advances have outgrown their application to network security or have not considered the challenges associated with their implementation in the real world. To understand these issues, this paper evaluates several RL approaches implemented in the second edition of the CAGE Challenge, a public competition to build an autonomous network defense agent in a high-fidelity network simulator. All of our approaches are based on the Proximal Policy Optimization (PPO) family of algorithms and include hierarchical RL, action masking, custom training, and ensemble RL. We found the RL set technique to work the best, outperforming our other models and coming in second in the competition. To understand the applicability to real environments, we evaluated the ability of each method to generalize to invisible networks and against an unknown attack strategy. In invisible environments, all of our approaches perform worse, with variable degradation depending on the rate of environmental change. Against an unknown attacker strategy, we found that our models had reduced overall performance even though the new strategy was less efficient than those our models were trained on. Together, these results highlight promising research directions for the defense of autonomous networks in the real world.