mbodied.agents.sense.audio package

Submodules

mbodied.agents.sense.audio.audio_agent module

class mbodied.agents.sense.audio.audio_agent.AudioAgent(listen_filename: str = 'tmp_listen.wav', tmp_speak_filename: str = 'tmp_speak.mp3', use_pyaudio: bool = True, client: OpenAI = None, api_key: str = None)[source]

Bases: Agent

Handles audio recording, playback, and speech-to-text transcription.

This module uses OpenAI’s API to transcribe audio input and synthesize speech. Set Environment Variable NO_AUDIO=1 to disable audio recording and playback. It will then take input from the terminal.

Usage:

audio_agent = AudioAgent(api_key=”your-openai-api-key”, use_pyaudio=False) audio_agent.speak(“How can I help you?”) message = audio_agent.listen()

act(*args, **kwargs)[source]

Act based on the observation.

Subclass should implement this method.

For remote actors, this method should call actor.act() correctly to perform the actions.

listen(keep_audio: bool = False, mode: str = 'speak') str[source]

Listens for audio input and transcribes it using OpenAI’s API.

Parameters:
  • keep_audio – Whether to keep the recorded audio file.

  • mode – The mode of input (speak, type, speak_or_type).

Returns:

The transcribed text from the audio input.

mode

alias of Literal[‘speak’, ‘type’, ‘speak_or_type’]

play_audio(filename: str) None[source]

Plays an audio file.

Parameters:

filename – The filename of the audio file to play.

record_audio() None[source]

Records audio from the microphone and saves it to a file.

speak(message: str, voice: str = 'onyx', api_key: str = None) None[source]

Synthesizes speech from text using OpenAI’s API and plays it back.

Parameters:
  • message – The text message to synthesize.

  • voice – The voice model to use for synthesis.

  • api_key – The API key for OpenAI.

Module contents

class mbodied.agents.sense.audio.AudioAgent(listen_filename: str = 'tmp_listen.wav', tmp_speak_filename: str = 'tmp_speak.mp3', use_pyaudio: bool = True, client: OpenAI = None, api_key: str = None)[source]

Bases: Agent

Handles audio recording, playback, and speech-to-text transcription.

This module uses OpenAI’s API to transcribe audio input and synthesize speech. Set Environment Variable NO_AUDIO=1 to disable audio recording and playback. It will then take input from the terminal.

Usage:

audio_agent = AudioAgent(api_key=”your-openai-api-key”, use_pyaudio=False) audio_agent.speak(“How can I help you?”) message = audio_agent.listen()

act(*args, **kwargs)[source]

Act based on the observation.

Subclass should implement this method.

For remote actors, this method should call actor.act() correctly to perform the actions.

actor: Backend
listen(keep_audio: bool = False, mode: str = 'speak') str[source]

Listens for audio input and transcribes it using OpenAI’s API.

Parameters:
  • keep_audio – Whether to keep the recorded audio file.

  • mode – The mode of input (speak, type, speak_or_type).

Returns:

The transcribed text from the audio input.

mode

alias of Literal[‘speak’, ‘type’, ‘speak_or_type’]

play_audio(filename: str) None[source]

Plays an audio file.

Parameters:

filename – The filename of the audio file to play.

record_audio() None[source]

Records audio from the microphone and saves it to a file.

speak(message: str, voice: str = 'onyx', api_key: str = None) None[source]

Synthesizes speech from text using OpenAI’s API and plays it back.

Parameters:
  • message – The text message to synthesize.

  • voice – The voice model to use for synthesis.

  • api_key – The API key for OpenAI.