GARY J. BRIGGS
Harnessing Constructive
Simulations for
Reinforcement Learning
R
einforcement learning (RL) is a powerful artificial intelligence technique for the develop-
ment of software agents that make intelligent decisions and exhibit complex behaviors. RL
works by applying feedback from the environment in the form of rewards and penalties to
induce agents to learn how to succeed in that environment. It has famously been used to
train agents to defeat human players in classic games of strategy, such as Go and Chess.
1
RL training usually takes place within an RL gym, an artificial environment optimized for such
training, in which the agent can be run rapidly through the same scenario many times.
2
It can take
millions of iterations to train, test, and refine software agents using these methods, so having a fast
and efficient RL gym is essential to
meet development timelines. Interact-
ing with unoptimized simulations—
or the real world—would be far
slower and more expensive, possibly
infeasibly so.
Combining the intelligence of
modern RL-trained agents with the
depth of established constructive
simulations could greatly improve the
analytic power of these simulations,
enabling researchers to represent
more-complicated interactions and
more-sophisticated environments.
However, although most state-of-
the-art RL gyms are written in the
Python programming language, most
constructive simulations are not. In
particular, many military research-
ers would like to be able to use
Python-defined RL agents within the
KEY TAKEAWAYS
■ RAND researchers have developed a flexible software harness
that enables the use of state-of-the-art reinforcement learning
(RL) methods in many existing constructive simulations without
requiring significant additional coding.
■ RL is a powerful artificial intelligence technique that can be used
to train software agents in constructive simulations to make deci-
sions desirable by the operator or behave more realistically.
■ Most modern RL gyms (for training software agents) are written in
Python, whereas some of the most widely used constructive sim-
ulations, such as the Air Force Research Laboratory’s Advanced
Framework for Simulation, Integration, and Modeling (AFSIM), are
written in other programming languages.
■ The RAND RL software harness isolates agent training from agent
employment, allowing researchers to use agents trained in modern
RL gyms within existing constructive simulations written in C++.
■ Researchers at RAND have demonstrated the harness in AFSIM
for the case of an aircraft attempting to penetrate an adversary’s
integrated air defense system.
■ RAND’s RL software harness has been made available to all autho-
rized users on the Air Force Research Laboratory’s AFSIM portal.
Research Report