mbodied.agents.language package

Submodules

mbodied.agents.language.language_agent module

Run a LanguageAgent with memory, optional remote acting, and optional automatic dataset creation capabilities.

While it is always recommended to explicitly define your observation and action spaces, which can be set with a gym.Space object or any python object using the Sample class (see examples/using_sample.py for a tutorial), you can have the recorder infer the spaces by setting recorder=”default” for automatic dataset recording.

For example:
>>> agent = LanguageAgent(context=SYSTEM_PROMPT, model_src=backend, recorder="default")
>>> agent.act_and_record("pick up the fork", image)

Alternatively, you can define the recorder separately to record the space you want. For example, to record the dataset with the image and instruction observation and AnswerAndActionsList as action:

>>> observation_space = spaces.Dict({"image": Image(size=(224, 224)).space(), "instruction": spaces.Text(1000)})
>>> action_space = AnswerAndActionsList(actions=[HandControl()] * 6).space()
>>> recorder = Recorder(
...     'example_recorder',
...     out_dir='saved_datasets',
...     observation_space=observation_space,
...     action_space=action_space
To record:
>>> recorder.record(
...     observation={
...         "image": image,
...         "instruction": instruction,
...     },
...     action=answer_actions,
... )
class mbodied.agents.language.language_agent.LanguageAgent(model_src: Literal['openai', 'anthropic'] | OpenAIBackendMixin | Url | Path = 'openai', context: list | Image | str | Message = None, api_key: str | None = None, model_kwargs: dict = None, recorder: Literal['default', 'omit'] | str = 'omit', recorder_kwargs: dict = None)[source]

Bases: Agent

An agent that can interact with users using natural language.

This class extends the functionality of a base Agent to handle natural language interactions. It manages memory, dataset-recording, and asynchronous remote inference, supporting multiple platforms including OpenAI, Anthropic, and Gradio.

reminders

A list of reminders that prompt the agent every n messages.

Type:

List[Reminder]

context

The current context of the conversation.

Type:

List[Message]

Inherits all attributes from the parent class `Agent`.

Examples

Basic usage with OpenAI:
>>> cognitive_agent = LanguageAgent(api_key="...", model_src="openai", recorder="default")
>>> cognitive_agent.act("your instruction", image)
Automatically act and record to dataset:
>>> cognitive_agent.act_and_record("your instruction", image)
act(instruction: str, image: Image = None, context: list | str | Image | Message = None, model=None, **kwargs) str[source]

Responds to the given instruction, image, and context.

Uses the given instruction and image to perform an action.

Parameters:
  • instruction – The instruction to be processed.

  • image – The image to be processed.

  • context – Additonal context to include in the response. If context is a list of messages, it will be interpreted as new memory.

  • model – The model to use for the response.

  • **kwargs – Additional keyword arguments.

Returns:

The response to the instruction.

Return type:

str

Example

>>> agent.act("Hello, world!", Image("scene.jpeg"))
"Hello! What can I do for you today?"
>>> agent.act("Return a plan to pickup the object as a python list.", Image("scene.jpeg"))
"['Move left arm to the object', 'Move right arm to the object']"
act_and_parse(instruction: str, image: ~mbodied.types.sense.vision.Image = None, parse_target: ~mbodied.types.sample.Sample = <class 'mbodied.types.sample.Sample'>, context: list | str | ~mbodied.types.sense.vision.Image | ~mbodied.types.message.Message = None, model=None, **kwargs) Sample[source]

Responds to the given instruction, image, and context and parses the response into a Sample object.

async async_act_and_parse(instruction: str, image: ~mbodied.types.sense.vision.Image = None, parse_target: ~mbodied.types.sample.Sample = <class 'mbodied.types.sample.Sample'>, context: list | str | ~mbodied.types.sense.vision.Image | ~mbodied.types.message.Message = None, model=None, **kwargs) Sample[source]

Responds to the given instruction, image, and context asynchronously and parses the response into a Sample object.

forget(everything=False, last_n: int = -1) None[source]

Forget the last n messages in the context.

forget_last() Message[source]

Forget the last message in the context.

history() List[Message][source]

Return the conversation history.

remind_every(prompt: str | Image | Message, n: int) None[source]

Remind the agent of the prompt every n messages.

class mbodied.agents.language.language_agent.Reminder(prompt: str | Image | Message, n: int)[source]

Bases: object

A reminder to show the agent a prompt every n messages.

n: int
prompt: str | Image | Message
mbodied.agents.language.language_agent.make_context_list(context: list[str | Image | Message] | Image | str | Message | None) List[Message][source]

Convert the context to a list of messages.

Module contents

class mbodied.agents.language.LanguageAgent(model_src: Literal['openai', 'anthropic'] | OpenAIBackendMixin | Url | Path = 'openai', context: list | Image | str | Message = None, api_key: str | None = None, model_kwargs: dict = None, recorder: Literal['default', 'omit'] | str = 'omit', recorder_kwargs: dict = None)[source]

Bases: Agent

An agent that can interact with users using natural language.

This class extends the functionality of a base Agent to handle natural language interactions. It manages memory, dataset-recording, and asynchronous remote inference, supporting multiple platforms including OpenAI, Anthropic, and Gradio.

reminders

A list of reminders that prompt the agent every n messages.

Type:

List[Reminder]

context

The current context of the conversation.

Type:

List[Message]

Inherits all attributes from the parent class `Agent`.

Examples

Basic usage with OpenAI:
>>> cognitive_agent = LanguageAgent(api_key="...", model_src="openai", recorder="default")
>>> cognitive_agent.act("your instruction", image)
Automatically act and record to dataset:
>>> cognitive_agent.act_and_record("your instruction", image)
act(instruction: str, image: Image = None, context: list | str | Image | Message = None, model=None, **kwargs) str[source]

Responds to the given instruction, image, and context.

Uses the given instruction and image to perform an action.

Parameters:
  • instruction – The instruction to be processed.

  • image – The image to be processed.

  • context – Additonal context to include in the response. If context is a list of messages, it will be interpreted as new memory.

  • model – The model to use for the response.

  • **kwargs – Additional keyword arguments.

Returns:

The response to the instruction.

Return type:

str

Example

>>> agent.act("Hello, world!", Image("scene.jpeg"))
"Hello! What can I do for you today?"
>>> agent.act("Return a plan to pickup the object as a python list.", Image("scene.jpeg"))
"['Move left arm to the object', 'Move right arm to the object']"
act_and_parse(instruction: str, image: ~mbodied.types.sense.vision.Image = None, parse_target: ~mbodied.types.sample.Sample = <class 'mbodied.types.sample.Sample'>, context: list | str | ~mbodied.types.sense.vision.Image | ~mbodied.types.message.Message = None, model=None, **kwargs) Sample[source]

Responds to the given instruction, image, and context and parses the response into a Sample object.

async async_act_and_parse(instruction: str, image: ~mbodied.types.sense.vision.Image = None, parse_target: ~mbodied.types.sample.Sample = <class 'mbodied.types.sample.Sample'>, context: list | str | ~mbodied.types.sense.vision.Image | ~mbodied.types.message.Message = None, model=None, **kwargs) Sample[source]

Responds to the given instruction, image, and context asynchronously and parses the response into a Sample object.

forget(everything=False, last_n: int = -1) None[source]

Forget the last n messages in the context.

forget_last() Message[source]

Forget the last message in the context.

history() List[Message][source]

Return the conversation history.

remind_every(prompt: str | Image | Message, n: int) None[source]

Remind the agent of the prompt every n messages.