mbodied.agents.sense.audio package∞

Submodules∞

mbodied.agents.sense.audio.audio_agent module∞

class mbodied.agents.sense.audio.audio_agent.AudioAgent(listen_filename: str = 'tmp_listen.wav', tmp_speak_filename: str = 'tmp_speak.mp3', use_pyaudio: bool = True, client: OpenAI = None, api_key: str = None)[source]∞

Bases: Agent

Handles audio recording, playback, and speech-to-text transcription.

This module uses OpenAI’s API to transcribe audio input and synthesize speech. Set Environment Variable NO_AUDIO=1 to disable audio recording and playback. It will then take input from the terminal.

Usage:: audio_agent = AudioAgent(api_key=”your-openai-api-key”, use_pyaudio=False) audio_agent.speak(“How can I help you?”) message = audio_agent.listen()

act(*args, **kwargs)[source]∞

Act based on the observation.

Subclass should implement this method.

For remote actors, this method should call actor.act() correctly to perform the actions.

listen(keep_audio: bool = False, mode: str = 'speak') → str[source]∞

Listens for audio input and transcribes it using OpenAI’s API.

Parameters:

keep_audio – Whether to keep the recorded audio file.
mode – The mode of input (speak, type, speak_or_type).

Returns:

The transcribed text from the audio input.

mode∞: alias of Literal[‘speak’, ‘type’, ‘speak_or_type’]

play_audio(filename: str) → None[source]∞

Plays an audio file.

Parameters:: filename – The filename of the audio file to play.

record_audio() → None[source]∞: Records audio from the microphone and saves it to a file.

speak(message: str, voice: str = 'onyx', api_key: str = None) → None[source]∞

Synthesizes speech from text using OpenAI’s API and plays it back.

Parameters:

message – The text message to synthesize.
voice – The voice model to use for synthesis.
api_key – The API key for OpenAI.

Module contents∞

class mbodied.agents.sense.audio.AudioAgent(listen_filename: str = 'tmp_listen.wav', tmp_speak_filename: str = 'tmp_speak.mp3', use_pyaudio: bool = True, client: OpenAI = None, api_key: str = None)[source]∞

Bases: Agent

Handles audio recording, playback, and speech-to-text transcription.

Usage:: audio_agent = AudioAgent(api_key=”your-openai-api-key”, use_pyaudio=False) audio_agent.speak(“How can I help you?”) message = audio_agent.listen()

act(*args, **kwargs)[source]∞

Act based on the observation.

Subclass should implement this method.

For remote actors, this method should call actor.act() correctly to perform the actions.

actor: Backend∞

listen(keep_audio: bool = False, mode: str = 'speak') → str[source]∞

Listens for audio input and transcribes it using OpenAI’s API.

Parameters:

keep_audio – Whether to keep the recorded audio file.
mode – The mode of input (speak, type, speak_or_type).

Returns:

The transcribed text from the audio input.

mode∞: alias of Literal[‘speak’, ‘type’, ‘speak_or_type’]

play_audio(filename: str) → None[source]∞

Plays an audio file.

Parameters:: filename – The filename of the audio file to play.

record_audio() → None[source]∞: Records audio from the microphone and saves it to a file.

speak(message: str, voice: str = 'onyx', api_key: str = None) → None[source]∞

Synthesizes speech from text using OpenAI’s API and plays it back.

Parameters:

message – The text message to synthesize.
voice – The voice model to use for synthesis.
api_key – The API key for OpenAI.