mbodied.agents package



mbodied.agents.agent module

class mbodied.agents.agent.Agent(recorder: Literal['omit', 'auto'] | str = 'omit', recorder_kwargs=None, api_key: str = None, model_src=None, model_kwargs=None)[source]

Bases: object

Abstract base class for agents.

This class provides a template for creating agents that can optionally record their actions and observations.


The recorder to record observations and actions.




The backend actor to perform actions.




Additional arguments to pass to the recorder.



ACTOR_MAP = {'anthropic': <class 'mbodied.agents.backends.anthropic_backend.AnthropicBackend'>, 'gradio': <class 'mbodied.agents.backends.gradio_backend.GradioBackend'>, 'http': <class 'mbodied.agents.backends.httpx_backend.HttpxBackend'>, 'ollama': <class 'mbodied.agents.backends.ollama_backend.OllamaBackend'>, 'openai': <class 'mbodied.agents.backends.openai_backend.OpenAIBackendMixin'>}
act(*args, **kwargs) Sample[source]

Act based on the observation.

Subclass should implement this method.

For remote actors, this method should call actor.act() correctly to perform the actions.

act_and_record(*args, **kwargs) Sample[source]

Peform action based on the observation and record the action, if applicable.

  • *args – Additional arguments to customize the action.

  • **kwargs – Additional arguments to customize the action.


The action sample created by the agent.

Return type:


async async_act(*args, **kwargs) Sample[source]

Act asynchronously based on the observation.

Subclass should implement this method.

For remote actors, this method should call actor.async_act() correctly to perform the actions.

async async_act_and_record(*args, **kwargs) Sample[source]

Act asynchronously based on the observation.

Subclass should implement this method.

For remote actors, this method should call actor.async_act() correctly to perform the actions.

static create_observation_from_args(observation_space, function, args, kwargs) dict[source]

Helper method to create an observation from the arguments of a function.

static handle_default(model_src: str, model_kwargs: dict) None[source]

Default to gradio then httpx backend if the model source is not recognized.

  • model_src – The model source to use.

  • model_kwargs – The additional arguments to pass to the model.

static init_backend(model_src: str, model_kwargs: dict, api_key: str) type[source]

Initialize the backend based on the model source.

  • model_src – The model source to use.

  • model_kwargs – The additional arguments to pass to the model.

  • api_key – The API key to use for the remote actor.


The backend class to use.

Return type:


load_model(model: str) None[source]

Load a model from a file or path. Required if the model is a weights path.


model – The path to the model file.

Module contents

class mbodied.agents.Agent(recorder: Literal['omit', 'auto'] | str = 'omit', recorder_kwargs=None, api_key: str = None, model_src=None, model_kwargs=None)[source]

Bases: object

Abstract base class for agents.

This class provides a template for creating agents that can optionally record their actions and observations.


The recorder to record observations and actions.




The backend actor to perform actions.




Additional arguments to pass to the recorder.



ACTOR_MAP = {'anthropic': <class 'mbodied.agents.backends.anthropic_backend.AnthropicBackend'>, 'gradio': <class 'mbodied.agents.backends.gradio_backend.GradioBackend'>, 'http': <class 'mbodied.agents.backends.httpx_backend.HttpxBackend'>, 'ollama': <class 'mbodied.agents.backends.ollama_backend.OllamaBackend'>, 'openai': <class 'mbodied.agents.backends.openai_backend.OpenAIBackendMixin'>}
act(*args, **kwargs) Sample[source]

Act based on the observation.

Subclass should implement this method.

For remote actors, this method should call actor.act() correctly to perform the actions.

act_and_record(*args, **kwargs) Sample[source]

Peform action based on the observation and record the action, if applicable.

  • *args – Additional arguments to customize the action.

  • **kwargs – Additional arguments to customize the action.


The action sample created by the agent.

Return type:


actor: AnthropicBackend | GradioBackend | OpenAIBackendMixin | HttpxBackend | OllamaBackend
async async_act(*args, **kwargs) Sample[source]

Act asynchronously based on the observation.

Subclass should implement this method.

For remote actors, this method should call actor.async_act() correctly to perform the actions.

async async_act_and_record(*args, **kwargs) Sample[source]

Act asynchronously based on the observation.

Subclass should implement this method.

For remote actors, this method should call actor.async_act() correctly to perform the actions.

static create_observation_from_args(observation_space, function, args, kwargs) dict[source]

Helper method to create an observation from the arguments of a function.

static handle_default(model_src: str, model_kwargs: dict) None[source]

Default to gradio then httpx backend if the model source is not recognized.

  • model_src – The model source to use.

  • model_kwargs – The additional arguments to pass to the model.

static init_backend(model_src: str, model_kwargs: dict, api_key: str) type[source]

Initialize the backend based on the model source.

  • model_src – The model source to use.

  • model_kwargs – The additional arguments to pass to the model.

  • api_key – The API key to use for the remote actor.


The backend class to use.

Return type:


load_model(model: str) None[source]

Load a model from a file or path. Required if the model is a weights path.


model – The path to the model file.

class mbodied.agents.LanguageAgent(model_src: Literal['openai', 'anthropic'] | OpenAIBackendMixin | Url | Path = 'openai', context: list | Image | str | Message = None, api_key: str | None = None, model_kwargs: dict = None, recorder: Literal['default', 'omit'] | str = 'omit', recorder_kwargs: dict = None)[source]

Bases: Agent

An agent that can interact with users using natural language.

This class extends the functionality of a base Agent to handle natural language interactions. It manages memory, dataset-recording, and asynchronous remote inference, supporting multiple platforms including OpenAI, Anthropic, and Gradio.


A list of reminders that prompt the agent every n messages.




The current context of the conversation.



Inherits all attributes from the parent class `Agent`.


Basic usage with OpenAI:
>>> cognitive_agent = LanguageAgent(api_key="...", model_src="openai", recorder="default")
>>> cognitive_agent.act("your instruction", image)
Automatically act and record to dataset:
>>> cognitive_agent.act_and_record("your instruction", image)
act(instruction: str, image: Image = None, context: list | str | Image | Message = None, model=None, **kwargs) str[source]

Responds to the given instruction, image, and context.

Uses the given instruction and image to perform an action.

  • instruction – The instruction to be processed.

  • image – The image to be processed.

  • context – Additonal context to include in the response. If context is a list of messages, it will be interpreted as new memory.

  • model – The model to use for the response.

  • **kwargs – Additional keyword arguments.


The response to the instruction.

Return type:



>>> agent.act("Hello, world!", Image("scene.jpeg"))
"Hello! What can I do for you today?"
>>> agent.act("Return a plan to pickup the object as a python list.", Image("scene.jpeg"))
"['Move left arm to the object', 'Move right arm to the object']"
act_and_parse(instruction: str, image: ~mbodied.types.sense.vision.Image = None, parse_target: ~mbodied.types.sample.Sample = <class 'mbodied.types.sample.Sample'>, context: list | str | ~mbodied.types.sense.vision.Image | ~mbodied.types.message.Message = None, model=None, **kwargs) Sample[source]

Responds to the given instruction, image, and context and parses the response into a Sample object.

async async_act_and_parse(instruction: str, image: ~mbodied.types.sense.vision.Image = None, parse_target: ~mbodied.types.sample.Sample = <class 'mbodied.types.sample.Sample'>, context: list | str | ~mbodied.types.sense.vision.Image | ~mbodied.types.message.Message = None, model=None, **kwargs) Sample[source]

Responds to the given instruction, image, and context asynchronously and parses the response into a Sample object.

forget(everything=False, last_n: int = -1) None[source]

Forget the last n messages in the context.

forget_last() Message[source]

Forget the last message in the context.

history() List[Message][source]

Return the conversation history.

remind_every(prompt: str | Image | Message, n: int) None[source]

Remind the agent of the prompt every n messages.

class mbodied.agents.MotorAgent(recorder: Literal['omit', 'auto'] | str = 'omit', recorder_kwargs=None, api_key: str = None, model_src=None, model_kwargs=None)[source]

Bases: Agent

Abstract base class for motor agents.

Subclassed from Agent, thus possessing the ability to make remote calls, etc.

abstract act(**kwargs) Motion[source]

Generate a Motion based on given parameters.


**kwargs – Arbitrary keyword arguments for motor agent to act on.


A Motion object based on the provided arguments.

Return type:
