mbodied.agents.language package∞
Submodules∞
mbodied.agents.language.language_agent module∞
Run a LanguageAgent with memory, optional remote acting, and optional automatic dataset creation capabilities.
While it is always recommended to explicitly define your observation and action spaces, which can be set with a gym.Space object or any python object using the Sample class (see examples/using_sample.py for a tutorial), you can have the recorder infer the spaces by setting recorder=”default” for automatic dataset recording.
- For example:
>>> agent = LanguageAgent(context=SYSTEM_PROMPT, model_src=backend, recorder="default") >>> agent.act_and_record("pick up the fork", image)
Alternatively, you can define the recorder separately to record the space you want. For example, to record the dataset with the image and instruction observation and AnswerAndActionsList as action:
>>> observation_space = spaces.Dict({"image": Image(size=(224, 224)).space(), "instruction": spaces.Text(1000)})
>>> action_space = AnswerAndActionsList(actions=[HandControl()] * 6).space()
>>> recorder = Recorder(
... 'example_recorder',
... out_dir='saved_datasets',
... observation_space=observation_space,
... action_space=action_space
- To record:
>>> recorder.record( ... observation={ ... "image": image, ... "instruction": instruction, ... }, ... action=answer_actions, ... )
- class mbodied.agents.language.language_agent.LanguageAgent(model_src: Literal['openai', 'anthropic'] | OpenAIBackendMixin | Url | Path = 'openai', context: list | Image | str | Message = None, api_key: str | None = None, model_kwargs: dict = None, recorder: Literal['default', 'omit'] | str = 'omit', recorder_kwargs: dict = None)[source]∞
Bases:
Agent
An agent that can interact with users using natural language.
This class extends the functionality of a base Agent to handle natural language interactions. It manages memory, dataset-recording, and asynchronous remote inference, supporting multiple platforms including OpenAI, Anthropic, and Gradio.
- Inherits all attributes from the parent class `Agent`.
Examples
- Basic usage with OpenAI:
>>> cognitive_agent = LanguageAgent(api_key="...", model_src="openai", recorder="default") >>> cognitive_agent.act("your instruction", image)
- Automatically act and record to dataset:
>>> cognitive_agent.act_and_record("your instruction", image)
- act(instruction: str, image: Image = None, context: list | str | Image | Message = None, model=None, **kwargs) str [source]∞
Responds to the given instruction, image, and context.
Uses the given instruction and image to perform an action.
- Parameters:
instruction – The instruction to be processed.
image – The image to be processed.
context – Additonal context to include in the response. If context is a list of messages, it will be interpreted as new memory.
model – The model to use for the response.
**kwargs – Additional keyword arguments.
- Returns:
The response to the instruction.
- Return type:
str
Example
>>> agent.act("Hello, world!", Image("scene.jpeg")) "Hello! What can I do for you today?" >>> agent.act("Return a plan to pickup the object as a python list.", Image("scene.jpeg")) "['Move left arm to the object', 'Move right arm to the object']"
- act_and_parse(instruction: str, image: ~mbodied.types.sense.vision.Image = None, parse_target: ~mbodied.types.sample.Sample = <class 'mbodied.types.sample.Sample'>, context: list | str | ~mbodied.types.sense.vision.Image | ~mbodied.types.message.Message = None, model=None, **kwargs) Sample [source]∞
Responds to the given instruction, image, and context and parses the response into a Sample object.
- async async_act_and_parse(instruction: str, image: ~mbodied.types.sense.vision.Image = None, parse_target: ~mbodied.types.sample.Sample = <class 'mbodied.types.sample.Sample'>, context: list | str | ~mbodied.types.sense.vision.Image | ~mbodied.types.message.Message = None, model=None, **kwargs) Sample [source]∞
Responds to the given instruction, image, and context asynchronously and parses the response into a Sample object.
Module contents∞
- class mbodied.agents.language.LanguageAgent(model_src: Literal['openai', 'anthropic'] | OpenAIBackendMixin | Url | Path = 'openai', context: list | Image | str | Message = None, api_key: str | None = None, model_kwargs: dict = None, recorder: Literal['default', 'omit'] | str = 'omit', recorder_kwargs: dict = None)[source]∞
Bases:
Agent
An agent that can interact with users using natural language.
This class extends the functionality of a base Agent to handle natural language interactions. It manages memory, dataset-recording, and asynchronous remote inference, supporting multiple platforms including OpenAI, Anthropic, and Gradio.
- Inherits all attributes from the parent class `Agent`.
Examples
- Basic usage with OpenAI:
>>> cognitive_agent = LanguageAgent(api_key="...", model_src="openai", recorder="default") >>> cognitive_agent.act("your instruction", image)
- Automatically act and record to dataset:
>>> cognitive_agent.act_and_record("your instruction", image)
- act(instruction: str, image: Image = None, context: list | str | Image | Message = None, model=None, **kwargs) str [source]∞
Responds to the given instruction, image, and context.
Uses the given instruction and image to perform an action.
- Parameters:
instruction – The instruction to be processed.
image – The image to be processed.
context – Additonal context to include in the response. If context is a list of messages, it will be interpreted as new memory.
model – The model to use for the response.
**kwargs – Additional keyword arguments.
- Returns:
The response to the instruction.
- Return type:
str
Example
>>> agent.act("Hello, world!", Image("scene.jpeg")) "Hello! What can I do for you today?" >>> agent.act("Return a plan to pickup the object as a python list.", Image("scene.jpeg")) "['Move left arm to the object', 'Move right arm to the object']"
- act_and_parse(instruction: str, image: ~mbodied.types.sense.vision.Image = None, parse_target: ~mbodied.types.sample.Sample = <class 'mbodied.types.sample.Sample'>, context: list | str | ~mbodied.types.sense.vision.Image | ~mbodied.types.message.Message = None, model=None, **kwargs) Sample [source]∞
Responds to the given instruction, image, and context and parses the response into a Sample object.
- async async_act_and_parse(instruction: str, image: ~mbodied.types.sense.vision.Image = None, parse_target: ~mbodied.types.sample.Sample = <class 'mbodied.types.sample.Sample'>, context: list | str | ~mbodied.types.sense.vision.Image | ~mbodied.types.message.Message = None, model=None, **kwargs) Sample [source]∞
Responds to the given instruction, image, and context asynchronously and parses the response into a Sample object.