API Reference#

opcc.get_dataset_names(env_name)[source]#

Retrieves list of dataset-names available for an environment

Parameters:

env_name (str) – name of the environment

Returns:

A list of dataset-names

Return type:

list[str]

Example:

>>> import opcc
>>> opcc.get_dataset_names('Hopper-v2')
['random', 'expert', 'medium', 'medium-replay', 'medium-expert']

opcc.get_env_names()[source]#

Retrieves list of environment for which queries are available

Returns:

A list of env-names

Return type:

list[str]

Example:

>>> import opcc
>>> opcc.get_env_names()
['HalfCheetah-v2', 'Hopper-v2', 'Walker2d-v2',
 'd4rl:maze2d-large-v1', 'd4rl:maze2d-medium-v1',
  'd4rl:maze2d-open-v0', 'd4rl:maze2d-umaze-v1']

opcc.get_policy(env_name, pre_trained=1)[source]#

Retrieves policies for the environment with the pre-trained quality marker.

Parameters:

env_name (str) – name of the environment
pre_trained (int) – pre_trained level of the policy . It should be between 1 and 4(inclusive) , where 1 indicates best model and 5 indicates worst level.

Returns:

A tuple containing two objects: 1) policy. 2) a dictionary of performance stats of the policy for the given env_name

Return type:

tuple of (ActorNetwork, dict)

Example:

>>> import opcc, torch
>>> policy, policy_stats = opcc.get_policy('d4rl:maze2d-open-v0',pre_trained=1)
>>> observation = torch.DoubleTensor([[0.5, 0.5, 0.5, 0.5]])
>>> action = policy(observation)
>>> action
tensor([[0.9977, 0.9998]], dtype=torch.float64, grad_fn=<MulBackward0>)

opcc.get_qlearning_dataset(env_name, dataset_name)[source]#

Retrieves list of episodic transitions for the given environment and dataset_name

Parameters:

env_name (str) – name of the environment
dataset_name (str) – name of the dataset

Example:

>>> import opcc
>>> dataset = opcc.get_qlearning_dataset('Hopper-v2', 'medium') # dictionaries
>>> dataset.keys()
dict_keys(['observations', 'actions', 'next_observations',
 'rewards', 'terminals'])
>>> len(dataset['observations']) # length of dataset
999998

opcc.get_queries(env_name)[source]#

Retrieves queries for the environment.

Parameters:

env_name (str) – name of the environment

Returns:

A nested dictionary with the following structure:

{
    (policy_a_args, policy_b_args): {
            'obs_a': list,

            'obs_b': list,

            'action_a': list,

            'action_b': list,

            'target': list,

            'horizon': list,
        }

}

Return type:

dict

Example:

>>> import opcc
>>> opcc.get_queries('Hopper-v2')

opcc.get_sequence_dataset(env_name, dataset_name)[source]#

Retrieves episodic dataset for the given environment and dataset_name

Parameters:

env_name (str) – name of the environment
dataset_name (str) – name of the dataset

Returns:

A list of dictionaries. Each dictionary is an episode containing keys [‘next_observations’, ‘observations’, ‘rewards’, ‘terminals’, ‘timeouts’]

Return type:

list[dict]

Example:

>>> import opcc
>>> dataset = opcc.get_sequence_dataset('Hopper-v2', 'medium') # list of episodes dictionaries
>>> len(dataset)
2186
>>> dataset[0].keys()
dict_keys(['actions', 'infos/action_log_probs', 'infos/qpos',
 'infos/qvel', 'next_observations', 'observations', 'rewards',
  'terminals', 'timeouts'])
>>> len(dataset[0]['observations']) # episode length
470