API Reference#
- opcc.get_dataset_names(env_name)[source]#
Retrieves list of dataset-names available for an environment
- Parameters:
env_name (str) – name of the environment
- Returns:
A list of dataset-names
- Return type:
list[str]
- Example:
>>> import opcc >>> opcc.get_dataset_names('Hopper-v2') ['random', 'expert', 'medium', 'medium-replay', 'medium-expert']
- opcc.get_env_names()[source]#
Retrieves list of environment for which queries are available
- Returns:
A list of env-names
- Return type:
list[str]
- Example:
>>> import opcc >>> opcc.get_env_names() ['HalfCheetah-v2', 'Hopper-v2', 'Walker2d-v2', 'd4rl:maze2d-large-v1', 'd4rl:maze2d-medium-v1', 'd4rl:maze2d-open-v0', 'd4rl:maze2d-umaze-v1']
- opcc.get_policy(env_name, pre_trained=1)[source]#
Retrieves policies for the environment with the pre-trained quality marker.
- Parameters:
env_name (str) – name of the environment
pre_trained (int) – pre_trained level of the policy . It should be between 1 and 4(inclusive) , where 1 indicates best model and 5 indicates worst level.
- Returns:
A tuple containing two objects: 1) policy. 2) a dictionary of performance stats of the policy for the given env_name
- Return type:
tuple of (ActorNetwork, dict)
- Example:
>>> import opcc, torch >>> policy, policy_stats = opcc.get_policy('d4rl:maze2d-open-v0',pre_trained=1) >>> observation = torch.DoubleTensor([[0.5, 0.5, 0.5, 0.5]]) >>> action = policy(observation) >>> action tensor([[0.9977, 0.9998]], dtype=torch.float64, grad_fn=<MulBackward0>)
- opcc.get_qlearning_dataset(env_name, dataset_name)[source]#
Retrieves list of episodic transitions for the given environment and dataset_name
- Parameters:
env_name (str) – name of the environment
dataset_name (str) – name of the dataset
- Example:
>>> import opcc >>> dataset = opcc.get_qlearning_dataset('Hopper-v2', 'medium') # dictionaries >>> dataset.keys() dict_keys(['observations', 'actions', 'next_observations', 'rewards', 'terminals']) >>> len(dataset['observations']) # length of dataset 999998
- opcc.get_queries(env_name)[source]#
Retrieves queries for the environment.
- Parameters:
env_name (str) – name of the environment
- Returns:
A nested dictionary with the following structure:
{ (policy_a_args, policy_b_args): { 'obs_a': list, 'obs_b': list, 'action_a': list, 'action_b': list, 'target': list, 'horizon': list, } }
- Return type:
dict
- Example:
>>> import opcc >>> opcc.get_queries('Hopper-v2')
- opcc.get_sequence_dataset(env_name, dataset_name)[source]#
Retrieves episodic dataset for the given environment and dataset_name
- Parameters:
env_name (str) – name of the environment
dataset_name (str) – name of the dataset
- Returns:
A list of dictionaries. Each dictionary is an episode containing keys [‘next_observations’, ‘observations’, ‘rewards’, ‘terminals’, ‘timeouts’]
- Return type:
list[dict]
- Example:
>>> import opcc >>> dataset = opcc.get_sequence_dataset('Hopper-v2', 'medium') # list of episodes dictionaries >>> len(dataset) 2186 >>> dataset[0].keys() dict_keys(['actions', 'infos/action_log_probs', 'infos/qpos', 'infos/qvel', 'next_observations', 'observations', 'rewards', 'terminals', 'timeouts']) >>> len(dataset[0]['observations']) # episode length 470