Development#

We begin by installing dev dependencies:

pip install -e ".[dev]"

Training Policies#

We primarily use td3 for training policies and hand-pick checkpoints at regular intervals to get policies of various qualities.

Run (example):

python scripts/td3.py --env Hopper-v2

For more details, refer to:

python scripts/td3.py --help

The policies are saved in opcc/assets/<env-name>/model/model_<model-id>.p, where id is a whole number. For semantic reasons, we assign larger number to poor quality policies.

Generate-Queries#

In order to generate queries for the considered environments and selected policies, we run following commands

% Mujoco (Gym) Environment
python scripts/generate_queries.py --env-name HalfCheetah-v2 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.1 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
python scripts/generate_queries.py --env-name Hopper-v2 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.1 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
python scripts/generate_queries.py --env-name Walker2d-v2 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.1 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb

% Maze Environment
python scripts/generate_queries.py --env-name d4rl:maze2d-large-v1 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.2 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
python scripts/generate_queries.py --env-name d4rl:maze2d-umaze-v1 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.2 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
python scripts/generate_queries.py --env-name d4rl:maze2d-medium-v1 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.2 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
python scripts/generate_queries.py --env-name d4rl:maze2d-open-v0 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.5 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb

You can understand available command attributes using following command

python scripts/generate_queries.py --help

Pre-trained policy stats#

Output of following command is used to updated benchmark information in readme.md or docs/source/benchmark-information.rst

python scripts/generate_policy_stats.py --all-envs

Also, refer to following for more usage details:

python scripts/generate_policy_stats.py --help

Testing Package#

Install Dependencies :
pip install -e ".[test]"

Testing is computationally expensive as we validate ground truth value estimates and corresponding labels. These can be disabled by setting following flags:

export SKIP_QUERY_TARGET_TESTS=1 # disable target estimation and label validation
export SKIP_Q_LEARNING_DATASET_TEST=1  # disable test for checking dataset existence
export SKIP_SEQUENCE_DATASET_TEST=1 # disables test for checking sequence dataset

Run:
pytest -v --xdoc

Generate Docs#

Install dependencies
pip install -e ".[docs]"
Install Pandoc.

Generate Sphinx Doc

sphinx-build -M html docs/source/ docs/build/ -a