Skip to content

Latest commit

 

History

History
196 lines (150 loc) · 8.37 KB

README.md

File metadata and controls

196 lines (150 loc) · 8.37 KB

Meta Motivo

Meta, FAIR

Overview

This repository provides a PyTorch implementation and pre-trained models for Meta Motivo. For details see the paper Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models.

Features

  • We provide 6 pretrained FB-CPR models for controlling the humanoid model defined in HumEnv.
  • Fully reproducible scripts for evaluating the model in HumEnv
  • Training code for FB and FB-CPR algorithms

Installation

The project is pip installable in your environment.

pip install "metamotivo[huggingface,humenv] @ git+https://github.com/facebookresearch/metamotivo.git"

It requires Python 3.10+ and has only two dependencies: torch >= 2, safetensors. Optional dependencies include humenv["bench"] and huggingface_hub for testing/training and loading models from HuggingFace.

Pretrained models

For reproducibility, we provide all the 5 models (metamotivo-S-X) we trained for producing the results in the paper, where each model is trained using a different random seed. We also provide our largest and most performant model (metamotivo-M-1), which can also be interactively tested in our demo.

Model # of params Download
metamotivo-S-1 24.5M link
metamotivo-S-2 24.5M link
metamotivo-S-3 24.5M link
metamotivo-S-4 24.5M link
metamotivo-S-5 24.5M link
metamotivo-M-1 288M link

Quick start

Once the library is installed, you can easily create an FB-CPR agent and download a pre-trained model from the Hugging Face hub. Note that the model is an instance of torch.nn.Module and by default it is initialized in "inference" mode (no_grad and eval mode).

We provide some simple code snippets to demonstrate how to use the model below. For more detailed examples, see our tutorials on interacting with the model, running an evaluation, and training from scratch.

Download the pre-trained models

The following code snippet shows how to instantiate the model.

from metamotivo.fb_cpr.huggingface import FBcprModel

model = FBcprModel.from_pretrained("facebook/metamotivo-S-1")

Download the buffers

For each model we provide:

  • The training buffer (that can be used for inference or offline training)
  • A small reward inference buffer (that contains the minimum amount of information for doing reward inference)
from huggingface_hub import hf_hub_download
import h5py

local_dir = "metamotivo-S-1-datasets"
dataset = "buffer_inference_500000.hdf5"  # a smaller buffer that can be used for reward inference
# dataset = "buffer.hdf5"  # the full training buffer of the model
buffer_path = hf_hub_download(
        repo_id="facebook/metamotivo-S-1",
        filename=f"data/{dataset}",
        repo_type="model",
        local_dir=local_dir,
    )
hf = h5py.File(buffer_path, "r")
print(hf.keys())

# create a DictBuffer object that can be used for sampling
data = {k: v[:] for k, v in hf.items()}
buffer = DictBuffer(capacity=data["qpos"].shape[0], device="cpu")
buffer.extend(data)

The FB-CPR model

The FB-CPR model contains several networks:

  • forward net
  • backward net
  • critic net
  • discriminator net
  • actor net

We provide functions for evaluating these networks

def backward_map(self, obs: torch.Tensor) -> torch.Tensor: ...
def forward_map(self, obs: torch.Tensor, z: torch.Tensor, action: torch.Tensor) -> torch.Tensor: ...
def actor(self, obs: torch.Tensor, z: torch.Tensor, std: float) -> torch.Tensor: ...
def critic(self, obs: torch.Tensor, z: torch.Tensor, action: torch.Tensor) -> torch.Tensor: ...
def discriminator(self, obs: torch.Tensor, z: torch.Tensor) -> torch.Tensor: ...

We also provide simple functions for prompting the model and obtaining a context vector z representing the task to execute.

#reward prompt (standard and weighted regression)
def reward_inference(self, next_obs: torch.Tensor, reward: torch.Tensor, weight: torch.Tensor | None = None,) -> torch.Tensor: ...
def reward_wr_inference(self, next_obs: torch.Tensor, reward: torch.Tensor) -> torch.Tensor: ...
#goal prompt
def goal_inference(self, next_obs: torch.Tensor) -> torch.Tensor: ...
#tracking prompt
def tracking_inference(self, next_obs: torch.Tensor) -> torch.Tensor:

Once we have a context vector z we can call the actor to get actions. We provide a function for acting in the environment with a standard interface.

def act(self, obs: torch.Tensor, z: torch.Tensor, mean: bool = True) -> torch.Tensor:

Note that these functions do not allow gradient computation and use eval mode since they are expected to be used for inference (torch.no_grad() and model.eval()). For training, you should directly access the class attributes. For training we also define target networks for the forward, backward and critic networks.

Execute a policy

This is the minimal example on how to execute a random policy

from humenv import make_humenv
from gymnasium.wrappers import FlattenObservation, TransformObservation
import torch
from metamotivo.fb_cpr.huggingface import FBcprModel

device = "cpu"
env, _ = make_humenv(
    num_envs=1,
    wrappers=[
        FlattenObservation,
        lambda env: TransformObservation(
            env, lambda obs: torch.tensor(obs.reshape(1, -1), dtype=torch.float32, device=device), env.observation_space # For gymnasium <1.0.0 remove the last argument: env.observation_space
        ),
    ],
    state_init="Default",
)

model = FBcprModel.from_pretrained("facebook/metamotivo-S-1")
model.to(device)
z = model.sample_z(1)
observation, _ = env.reset()
for i in range(10):
    action = model.act(observation, z, mean=True)
    observation, reward, terminated, truncated, info = env.step(action.cpu().numpy().ravel())

Evaluation in HumEnv

For reproducibility of the paper, we provide a way of evaluating the models using HumEnv. We provide wrappers that can be used to interface Meta Motivo with humenv.bench reward, goal and tracking evaluation.

Here is an example of how to use the wrappers for reward evaluation:

from metamotivo.fb_cpr.huggingface import FBcprModel
from metamotivo.wrappers.humenvbench import RewardWrapper 
import humenv.bench

model = FBcprModel.from_pretrained("facebook/metamotivo-S-1")

# this enable reward relabeling and context inference
model = RewardWrapper(
        model=model,
        inference_dataset=buffer, # see above how to download and create a buffer
        num_samples_per_inference=100_000,
        inference_function="reward_wr_inference",
        max_workers=80,
    )
# create the evaluation from humenv
reward_eval = humenv.bench.RewardEvaluation(
        tasks=["move-ego-0-0"],
        env_kwargs={
            "state_init": "Default",
        },
        num_contexts=1,
        num_envs=50,
        num_episodes=100
    )
scores = reward_eval.run(model)

You can do the same for the other evaluations provided in humenv.bench. Please refer to examples/humenv_evaluation.py for a full evaluation loop.

Citation

@article{tirinzoni2024metamotivo,
  title={Zero-shot Whole-Body Humanoid Control via Behavioral Foundation Models},
  author={Tirinzoni, Andrea and Touati, Ahmed and Farebrother, Jesse and Guzek, Mateusz and Kanervisto, Anssi and Xu, Yingchen and Lazaric, Alessandro and Pirotta, Matteo},
}

License

Meta Motivo is licensed under the CC BY-NC 4.0 license. See LICENSE for details.