RAND:利用构造性模拟进行强化学习(2024) 8页

VIP文档

ID:70835

大小:0.57 MB

页数:8页

时间:2024-08-09

金币:10

上传者:PASHU
GARY J. BRIGGS
Harnessing Constructive
Simulations for
Reinforcement Learning
R
einforcement learning (RL) is a powerful artificial intelligence technique for the develop-
ment of software agents that make intelligent decisions and exhibit complex behaviors. RL
works by applying feedback from the environment in the form of rewards and penalties to
induce agents to learn how to succeed in that environment. It has famously been used to
train agents to defeat human players in classic games of strategy, such as Go and Chess.
1
RL training usually takes place within an RL gym, an artificial environment optimized for such
training, in which the agent can be run rapidly through the same scenario many times.
2
It can take
millions of iterations to train, test, and refine software agents using these methods, so having a fast
and efficient RL gym is essential to
meet development timelines. Interact-
ing with unoptimized simulations
or the real world—would be far
slower and more expensive, possibly
infeasibly so.
Combining the intelligence of
modern RL-trained agents with the
depth of established constructive
simulations could greatly improve the
analytic power of these simulations,
enabling researchers to represent
more-complicated interactions and
more-sophisticated environments.
However, although most state-of-
the-art RL gyms are written in the
Python programming language, most
constructive simulations are not. In
particular, many military research-
ers would like to be able to use
Python-defined RL agents within the
KEY TAKEAWAYS
RAND researchers have developed a flexible software harness
that enables the use of state-of-the-art reinforcement learning
(RL) methods in many existing constructive simulations without
requiring significant additional coding.
RL is a powerful artificial intelligence technique that can be used
to train software agents in constructive simulations to make deci-
sions desirable by the operator or behave more realistically.
Most modern RL gyms (for training software agents) are written in
Python, whereas some of the most widely used constructive sim-
ulations, such as the Air Force Research Laboratory’s Advanced
Framework for Simulation, Integration, and Modeling (AFSIM), are
written in other programming languages.
The RAND RL software harness isolates agent training from agent
employment, allowing researchers to use agents trained in modern
RL gyms within existing constructive simulations written in C++.
Researchers at RAND have demonstrated the harness in AFSIM
for the case of an aircraft attempting to penetrate an adversary’s
integrated air defense system.
RAND’s RL software harness has been made available to all autho-
rized users on the Air Force Research Laboratory’s AFSIM portal.
Research Report
资源描述:

本报告介绍了兰德公司的研究人员如何开发出一种灵活的软件工具,该工具能够在许多现有的构造性模拟中使用最先进的强化学习方法,而不需要大量的额外编码。

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭