mbodied.agents.sense.audio package∞
Submodules∞
mbodied.agents.sense.audio.audio_agent module∞
- class mbodied.agents.sense.audio.audio_agent.AudioAgent(listen_filename: str = 'tmp_listen.wav', tmp_speak_filename: str = 'tmp_speak.mp3', use_pyaudio: bool = True, client: OpenAI = None, api_key: str = None)[source]∞
Bases:
Agent
Handles audio recording, playback, and speech-to-text transcription.
This module uses OpenAI’s API to transcribe audio input and synthesize speech. Set Environment Variable NO_AUDIO=1 to disable audio recording and playback. It will then take input from the terminal.
- Usage:
audio_agent = AudioAgent(api_key=”your-openai-api-key”, use_pyaudio=False) audio_agent.speak(“How can I help you?”) message = audio_agent.listen()
- act(*args, **kwargs)[source]∞
Act based on the observation.
Subclass should implement this method.
For remote actors, this method should call actor.act() correctly to perform the actions.
- listen(keep_audio: bool = False, mode: str = 'speak') str [source]∞
Listens for audio input and transcribes it using OpenAI’s API.
- Parameters:
keep_audio – Whether to keep the recorded audio file.
mode – The mode of input (speak, type, speak_or_type).
- Returns:
The transcribed text from the audio input.
- mode∞
alias of
Literal
[‘speak’, ‘type’, ‘speak_or_type’]
Module contents∞
- class mbodied.agents.sense.audio.AudioAgent(listen_filename: str = 'tmp_listen.wav', tmp_speak_filename: str = 'tmp_speak.mp3', use_pyaudio: bool = True, client: OpenAI = None, api_key: str = None)[source]∞
Bases:
Agent
Handles audio recording, playback, and speech-to-text transcription.
This module uses OpenAI’s API to transcribe audio input and synthesize speech. Set Environment Variable NO_AUDIO=1 to disable audio recording and playback. It will then take input from the terminal.
- Usage:
audio_agent = AudioAgent(api_key=”your-openai-api-key”, use_pyaudio=False) audio_agent.speak(“How can I help you?”) message = audio_agent.listen()
- act(*args, **kwargs)[source]∞
Act based on the observation.
Subclass should implement this method.
For remote actors, this method should call actor.act() correctly to perform the actions.
- actor: Backend∞
- listen(keep_audio: bool = False, mode: str = 'speak') str [source]∞
Listens for audio input and transcribes it using OpenAI’s API.
- Parameters:
keep_audio – Whether to keep the recorded audio file.
mode – The mode of input (speak, type, speak_or_type).
- Returns:
The transcribed text from the audio input.
- mode∞
alias of
Literal
[‘speak’, ‘type’, ‘speak_or_type’]