Create an OpenAI Plugin¶
SCRIMMAGE can be used in conjunction with OpenAI to generate environments for learning scenarios. OpenAI environments need to have a state space, action space, and a reward function. In this tutorial, we will go through a quick example of how to set up a scrimmage agent to interact with OpenAI by providing a state space, action space, and reward. We will use the plugins project we created in a previous tutorial (Create a SCRIMMAGE Plugins Project) to build our new OpenAI plugins. To start with, we will need to create an Autonomy and a Sensor plugin. Files similar to what is described below are located in the following places:
scrimmage/include/scrimmage/plugins/autonomy/RLSimple
scrimmage/src/plugins/autonomy/RLSimple
scrimmage/include/scrimmage/plugins/sensor/RLSimpleSensor
scrimmage/src/plugins/sensor/RLSimpleSensor
scrimmage/missions/rlsimple.xml
scrimmage/test/test_openai.py
In addition, the interface between python and scrimmage is defined in
scrimmage/python/scrimmage/bindings/src/py_openai_env.h
.
We can use the script provided with SCRIMMAGE to create these plugins (we will
call them something different in this tutorial). To do so,
enter the following at the terminal:
cd /path/to/scrimmage/scripts
./generate-plugin.sh autonomy SimpleLearner ~/scrimmage/my-scrimmage-plugins
./generate-plugin.sh sensor MyOpenAISensor ~/scrimmage/my-scrimmage-plugins
Now, let’s build the plugins that was placed in the my-scrimmage-plugins project:
cd ~/scrimmage/my-scrimmage-plugins/build
cmake ..
make
source ~/.scrimmage/setup.bash # you probably already did this step
If the plugin built successfuly, you will see the following output:
[ 50%] Built target MyOpenAISensor_plugin
[100%] Built target SimpleLearner
In general, the SCRIMMAGE openai interface allows you to customize the environment in the following ways:
n agents - The most common scenario is 1 agent but SCRIMMAGE also supports multi-agent reinforcement learning. In particular, it supports centralized approaches where actions and observations for individual learning agents are combined into a single action. Alternatively, one can have multiple agents operate independently.
Action/Observation space - this can be discrete, continuous, or combined (the latter would a
TupleSpace
in openai).
For a demonstration of these combinations, see test/test_openai.py
.
For this tutorial, we will only be investigating how to develop a discrete
action space and a continuous observation space.
Rewrite the Autonomy Plugin¶
OpenAI Autonomy Plugin Header File¶
In this tutorial, we are going to take the autonomy plugin and rewrite for use
with OpenAI. Let’s start with the header file located at
~/scrimmage/my-scrimmage-plugins/include/my-scrimmage-plugins/plugins/autonomy/SimpleLearner/SimpleLearner.h
.
Normal autonomy plugins extend the Autonomy class. For the OpenAI autonomy
plugin, we will instead extend the ScrimmageOpenAIAutonomy
autonomy plugin.
In this case, the init
and step_autonomy
functions become init_helper
and step_helper
(the base class will handle the init
and step_autonomy
functions and will call these methods). These methods can be treated similarly
to init
and step_autonomy
and exist so that the user can switch
between a learning and non-learning mode seamlessly. In addition,
ScrimmageOpenAIAutonomy
defines some extra components:
reward_range
- this setting maps directly to an openai environment’s reward_range.
action_space
- this is a struct containing a vector ofdiscrete_counts
andcontinuous_extrema
. For each discrete action you want your autonomy to have, add an element toaction_space.discrete_counts
with the number of discrete actions (this will result in aDiscrete
orMultiDiscrete
openai space, depending on whether there is one or multiple elements inaction_space.discrete_counts
). Similarly, if you want continuous actions, setaction_space.continuous_extrema
with the low and high values (this will result in aBox
space). If bothdiscrete_counts
andcontinuous_extrema
have elements then the resulting space will be an openaiTupleSpace
.
set_environment
- a virtual method that will be called after all agents have been generated. This is meant to setreward_range
andaction_space
calc_reward
- called every timestep, this function returns a tuple with the following order:
done: whether the environment is done
reward: what the reward for that step is.
info: a pybind11 dict object that can be used for debugging. Examples of use will be given below. To return an empty dict (the typical case), just use
pybind11::dict()
In python, when you call env.step(action)
, the action will be copied
to your autonomy’s action
member (defined in ScrimmageOpenAIAutonomy
.
Here is the include file for SimpleLearner
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | #ifndef INCLUDE_MY_SCRIMMAGE_PLUGINS_PLUGINS_AUTONOMY_SIMPLELEARNER_SIMPLELEARNER_H_
#define INCLUDE_MY_SCRIMMAGE_PLUGINS_PLUGINS_AUTONOMY_SIMPLELEARNER_SIMPLELEARNER_H_
#include <scrimmage/plugins/autonomy/ScrimmageOpenAIAutonomy/ScrimmageOpenAIAutonomy.h>
#include <map>
#include <string>
#include <utility>
class SimpleLearner : public scrimmage::autonomy::ScrimmageOpenAIAutonomy {
public:
void init_helper(std::map<std::string, std::string> ¶ms) override;
bool step_helper() override;
void set_environment() override;
std::tuple<bool, double, pybind11::dict> calc_reward() override;
protected:
double radius_;
uint8_t output_vel_x_idx_ = 0;
};
#endif // INCLUDE_MY_SCRIMMAGE_PLUGINS_PLUGINS_AUTONOMY_SIMPLELEARNER_SIMPLELEARNER_H_
|
Note that we are overriding four virtual functions: init_helper
,
step_helper
, set_environment
, and calc_reward
.
OpenAI Autonomy Plugin Source File¶
Now let’s open our source file located at
~/scrimmage/my-scrimmage-plugins/src/plugins/autonomy/SimpleLearner/SimpleLearner.cpp
.
We will first change the includes at the top of the file to be:
1 2 3 4 5 6 7 | #include <scrimmage/math/State.h>
#include <scrimmage/parse/ParseUtils.h>
#include <scrimmage/plugin_manager/RegisterPlugin.h>
#include <my-scrimmage-plugins/plugins/autonomy/SimpleLearner/SimpleLearner.h>
REGISTER_PLUGIN(scrimmage::Autonomy, SimpleLearner, SimpleLearner_plugin)
|
Next, let us look at the init
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | void SimpleLearner::init_helper(std::map<std::string, std::string> ¶ms) {
using Type = scrimmage::VariableIO::Type;
using Dir = scrimmage::VariableIO::Direction;
output_vel_x_idx_ = vars_.declare(Type::velocity_x, Dir::Out);
const uint8_t output_vel_y_idx = vars_.declare(Type::velocity_y, Dir::Out);
const uint8_t output_vel_z_idx = vars_.declare(Type::velocity_z, Dir::Out);
vars_.output(output_vel_x_idx_, 0);
vars_.output(output_vel_y_idx, 0);
vars_.output(output_vel_z_idx, 0);
radius_ = std::stod(params.at("radius"));
}
|
We now define the environment:
1 2 3 4 | void SimpleLearner::set_environment() {
reward_range = std::make_pair(0, 1);
action_space.discrete_count.push_back(2);
}
|
This says that the reward range will be between 0 and 1
and we will have a single discrete action that can take values of 0 or 1.
We now define the calc_reward
function:
1 2 3 4 5 6 7 8 9 10 11 | std::tuple<bool, double, pybind11::dict> SimpleLearner::calc_reward() {
const bool done = false;
const double x = state_->pos()(0);
const bool within_radius = std::round(std::abs(x)) < radius_;
double reward = within_radius ? 1 : 0;
// here we setup the debugging info.
pybind11::dict info;
info["x_within_radius"] = within_radius; // an example of adding debugging information
return std::make_tuple(done, reward, info);
}
|
This says that the autonomy is never going to end the simulation and gives
a reward for being within the radius
of the origin. We now define step_helper
to handle actions given from python. It will have positive x-velocity
when the action is 1 and negative x-velocity when the action is 0:
1 2 3 4 5 | bool SimpleLearner::step_helper() {
const double x_vel = action.discrete[0] ? 1 : -1;
vars_.output(output_vel_x_idx_, x_vel);
return true;
}
|
Rewrite CMakeLists.txt for OpenAI Autonomy¶
The SimpleLearner
C++ code is now finished. Before we can build it though,
we do need to make a small edit to the CMakeLists.txt
. Open up
~/scrimmage/my-scrimmage-plugins/src/plugins/autonomy/SimpleLearner/CMakeLists.txt
and change line 15 from
1 2 3 4 | TARGET_LINK_LIBRARIES(${LIBRARY_NAME}
scrimmage-core
ScrimmageOpenAIAutonomy_plugin
)
|
This makes sure the plugin links to the libraries it needs.
Plugin Parameter File¶
The following is the parameter file for SimpleLearner
:
1 2 3 4 5 6 7 8 | <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://gtri.gatech.edu"?>
<params>
<library>SimpleLearner_plugin</library>
<radius>2</radius>
<module>my_openai</module>
<actor_init_func>return_action_func</actor_init_func>
</params>
|
module
and actor_init_func
exist so that we can
call our learner outside the OpenAI environment. This is useful
in case we want to train using python and then do a lot of runs
to test/verify what has been learned.
We will discuss this later in Run In Non-learning Mode (for non-Tensorflow-based code).
From here, we can now build the project:
1 2 3 | cd ~/scrimmage/my-scrimmage-plugins/build
cmake ..
make
|
Rewrite the Sensor Plugin¶
OpenAI Sensor Plugin Header File¶
The sensor plugin is very similar to the autonomy plugin. It inherits
from ScrimmageOpenAISensor
which provides
the following:
observation_space
- this has the same effect asaction_space
above but will determine the environment’s observation space.
set_observation_space
- this is similar toset_environment
above but is designed to set the variableobservation_space
after all entities have been generates.
get_observation
- there are two versions of this virtual function: one for discrete observations and another for continuous observations. Note that because observations can sometimes be high dimensional, these functions directly edit the underlying python buffers. This avoids a needless copy.
Now let’s move on to defining the observation space. We shall do this with through a
sensor plugin to OpenAI. We shall start by rewriting the header file for the
sensor plugin we created above. You can find it at
~/scrimmage/my-scrimmage-plugins/include/my-scrimmage-plugins/plugins/sensor/MyOpenAISensor/MyOpenAISensor.h
.
First up, we shall rewrite the includes in MyOpenAISensor.h
to be the
following. The main thing to note is that it inherits from ScrimmageOpenAISensor
and overrides two virtual methods:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #ifndef INCLUDE_MY_SCRIMMAGE_PLUGINS_PLUGINS_SENSOR_MYOPENAISENSOR_MYOPENAISENSOR_H_
#define INCLUDE_MY_SCRIMMAGE_PLUGINS_PLUGINS_SENSOR_MYOPENAISENSOR_MYOPENAISENSOR_H_
#include <scrimmage/plugins/sensor/ScrimmageOpenAISensor/ScrimmageOpenAISensor.h>
#include <map>
#include <string>
#include <vector>
class MyOpenAISensor : public scrimmage::sensor::ScrimmageOpenAISensor {
public:
void set_observation_space() override;
void get_observation(double* data, uint32_t beg_idx, uint32_t end_idx) override;
};
#endif // INCLUDE_MY_SCRIMMAGE_PLUGINS_PLUGINS_SENSOR_MYOPENAISENSOR_MYOPENAISENSOR_H_
|
OpenAI Sensor Plugin Source File¶
From here, we can now look at the implementation of these methods in
~/scrimmage/my-scrimmage-plugins/src/plugins/sensor/MyOpenAISensor/MyOpenAISensor.cpp
.
In this source file, we need to add the following includes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #include <my-scrimmage-plugins/plugins/sensor/MyOpenAISensor/MyOpenAISensor.h>
#include <scrimmage/entity/Entity.h>
#include <scrimmage/math/State.h>
#include <scrimmage/plugin_manager/RegisterPlugin.h>
REGISTER_PLUGIN(scrimmage::Sensor, MyOpenAISensor, MyOpenAISensor_plugin)
void MyOpenAISensor::get_observation(double *data, uint32_t beg_idx, uint32_t /*end_idx*/) {
data[beg_idx] = parent_->state()->pos()(0);
}
void MyOpenAISensor::set_observation_space() {
const double inf = std::numeric_limits<double>::infinity();
observation_space.continuous_extrema.push_back(std::make_pair(-inf, inf));
}
|
Plugin Parameter File¶
1 2 3 4 5 | <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://gtri.gatech.edu"?>
<params>
<library>MyOpenAISensor_plugin</library>
</params>
|
Rewrite CMakeLists.txt for OpenAI Sensor¶
The MyOpenAISensor
C++ code is now finished. Before we can build it though,
we do need to make a small edit to the CMakeLists.txt
. Open up
~/scrimmage/my-scrimmage-plugins/src/plugins/autonomy/SimpleLearner/CMakeLists.txt
and change line 15 from
1 2 3 4 | TARGET_LINK_LIBRARIES(${LIBRARY_NAME}
scrimmage-core
ScrimmageOpenAISensor_plugin
)
|
OpenAI Mission XML File¶
Now that our code for SCRIMMAGE has been compiled, we can then create a simple
mission xml file for it. We will save this xml at:
~/scrimmage/my-scrimmage-plugins/missions/openai_mission.xml
.
To create the environment as we described above, the mission xml would need the following blocks (More detail on creating mission files is located at Mission XML Tag Definitions ):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://gtri.gatech.edu"?>
<runscript xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
name="Straight flying">
<run start="0.0" end="100" dt="1"
time_warp="10"
enable_gui="true"
network_gui="false"
start_paused="true"/>
<stream_port>50051</stream_port>
<stream_ip>localhost</stream_ip>
<end_condition>time</end_condition> <!-- time, one_team, none-->
<grid_spacing>1</grid_spacing>
<grid_size>1000</grid_size>
<gui_update_period>10</gui_update_period> <!-- milliseconds -->
<output_type>summary</output_type>
<metrics>OpenAIRewards</metrics>
<background_color>191 191 191</background_color> <!-- Red Green Blue -->
<log_dir>~/.scrimmage/logs</log_dir>
<entity_common name="all">
<count>1</count>
<health>1</health>
<radius>1</radius>
<team_id>1</team_id>
<visual_model>Sphere</visual_model>
<motion_model>SingleIntegrator</motion_model>
<controller>SingleIntegratorControllerSimple</controller>
<sensor>MyOpenAISensor</sensor>
<autonomy>SimpleLearner</autonomy>
<y>0</y>
<z>0</z>
</entity_common>
<entity entity_common="all">
<x>0</x>
<color>77 77 255</color>
</entity>
</runscript>
|
Now we have completed our work on the SCRIMMAGE side. Now all that is left is to write the python code to run our OpenAI environment.
Running The OpenAI Environment¶
The following python code will create a scrimmage environment, using the mission
file we create above. It will then do a simple environment test by stepping
through the environment and keeping track of the observations. It also sends
a straight ahead action for the first 100 timesteps and afterwards sends a turn
right action. At the end, it closes the environment and prints out the total
reward. We will save this python file at
~/scrimmage/my-scrimmage-plugins/my_openai.py
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | import copy
import gym
import scrimmage.utils
import random
def get_action(obs):
return random.randint(0, 1)
# Used for non-learning mode
def return_action_func(action_space, obs_space, params):
return get_action
def test_openai():
try:
env = gym.make('scrimmage-v0')
except gym.error.Error:
mission_file = scrimmage.utils.find_mission('openai_mission.xml')
gym.envs.register(
id='scrimmage-v0',
entry_point='scrimmage.bindings:ScrimmageOpenAIEnv',
max_episode_steps=1e9,
reward_threshold=1e9,
kwargs={"enable_gui": False,
"mission_file": mission_file}
)
env = gym.make('scrimmage-v0')
# the observation is the x position of the vehicle
# note that a deepcopy is used when a history
# of observations is desired. This is because
# the sensor plugin edits the data in-place
obs = []
temp_obs = copy.deepcopy(env.reset())
obs.append(temp_obs)
total_reward = 0
for i in range(200):
action = get_action(temp_obs)
temp_obs, reward, done = env.step(action)[:3]
obs.append(copy.deepcopy(temp_obs))
total_reward += reward
if done:
break
env.close()
print("Total Reward: %2.2f" % total_reward)
if __name__ == '__main__':
test_openai()
|
If you haven’t done so already, source the environment setup file. Otherwise it won’t find the mission below:
source ~/.scrimmage/setup.bash # you probably already did this step
Now that we have completed all of the code, we can simply type the following into the terminal to see it run!
$ python my_openai.py
Run In Non-learning Mode (for non-Tensorflow-based code)¶
In Running The OpenAI Environment we ran an OpenAI environment using the newly defined environments from SCRIMMAGE. If my_openai.py had done something with the data from the environment, it may have learned something useful and you may want to now test/validate what has been learned using for instance Multiple Runs on a Local Computer. In other words, we want to be able to run this line:
$ scrimmage missions/openai_mission.xml
as well as
$ python my_openai.py
Because you set the module
as “my_openai” and actor_init_func
as
“return_action_func” in your SimpleLearner.xml
file, SCRIMMAGE knows where to find
what it needs. actor_init_func
is a function that will take in the action
space, observation space, and parameters from SCRIMMAGE in order to return a
function that will then be used during non-training mode. This setup is
useful in cases where there is some initialization needed before being able to return actions. For example, there might a directory pointing to some saved
model that needs to be loaded first. By adding a directory parameter to the SimpleLearner.xml
file, it will then be passed down to the function named in actor_init_func
. The function returned from actor_init_func
should take in a
observation and return an action.
To use non-learning mode, nonlearning_mode_openai_plugin
should be set to true in your mission or
plugin parameter file. Note, if your action method uses tensorflow, you will
need to use Run In Non-learning Mode using gRPC (for Tensorflow-based code).
$ scrimmage missions/openai_mission.xml
[> ] 0 %
================================================================================
OpenAIRewards
================================================================================
Reward for id 1 = 20
Simulation Complete
Run In Non-learning Mode using gRPC (for Tensorflow-based code)¶
In Run In Non-learning Mode (for non-Tensorflow-based code), we could start up a trained agent directly inside of SCRIMMAGE. However if your training code uses tensorflow, that method won’t work. What you will need to do instead is use the gRPC mode for non-learning mode. This creates a gRPC server in python that will then run the agent and send the action back across to SCRIMMAGE. To use this mode, module
and actor_init_func
will need to be set as before. Then, grpc_mode
will also need to be set to “true” in SimpleLearner.xml
. Additional arguments such as grpc_address
or port
can also be changed to move what address and port number the gRPC server will located at but the defaults should work fine for most users (localhost on port 50051).
Once the parameters are set, you should be able to run this line:
$ scrimmage missions/openai_mission.xml
PYTHON: GRPC Server Started
Connecting to gRPC Server attempt: 1/ 10
================================================================================
OpenAIRewards
================================================================================
Reward for id 1 = 20
Simulation Complete