Development#
We begin by installing dev
dependencies:
pip install -e ".[dev]"
Training Policies#
We primarily use td3
for training policies and hand-pick checkpoints at regular intervals to get policies of various qualities.
Run (example):
python scripts/td3.py --env Hopper-v2
For more details, refer to:
python scripts/td3.py --help
The policies are saved in opcc/assets/<env-name>/model/model_<model-id>.p
, where id
is a whole number. For semantic reasons, we assign larger number to poor quality policies.
Generate-Queries#
In order to generate queries for the considered environments and selected policies, we run following commands
% Mujoco (Gym) Environment
python scripts/generate_queries.py --env-name HalfCheetah-v2 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.1 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
python scripts/generate_queries.py --env-name Hopper-v2 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.1 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
python scripts/generate_queries.py --env-name Walker2d-v2 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.1 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
% Maze Environment
python scripts/generate_queries.py --env-name d4rl:maze2d-large-v1 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.2 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
python scripts/generate_queries.py --env-name d4rl:maze2d-umaze-v1 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.2 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
python scripts/generate_queries.py --env-name d4rl:maze2d-medium-v1 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.2 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
python scripts/generate_queries.py --env-name d4rl:maze2d-open-v0 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.5 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb
You can understand available command attributes using following command
python scripts/generate_queries.py --help
Pre-trained policy stats#
Output of following command is used to updated benchmark information in readme.md
or docs/source/benchmark-information.rst
python scripts/generate_policy_stats.py --all-envs
Also, refer to following for more usage details:
python scripts/generate_policy_stats.py --help
Testing Package#
- Install Dependencies :
pip install -e ".[test]"
Testing is computationally expensive as we validate ground truth value estimates and corresponding labels. These can be disabled by setting following flags:
export SKIP_QUERY_TARGET_TESTS=1 # disable target estimation and label validation export SKIP_Q_LEARNING_DATASET_TEST=1 # disable test for checking dataset existence export SKIP_SEQUENCE_DATASET_TEST=1 # disables test for checking sequence dataset
- Run:
pytest -v --xdoc
Generate Docs#
Install dependencies
pip install -e ".[docs]"
Install Pandoc.
Generate Sphinx Doc
sphinx-build -M html docs/source/ docs/build/ -a