We have created a Discord bot with Python that is able to listen to users in a voice call, and when prompted by a command, it records the user's audio, transcribes the audio using OpenAI's Whisper, generate a response using GPT-3, generate text to speech using the Uberduck API, and then finally send an the response audio back into the Discord voice call. While we think there is room to improve our implementation of the project, we think that it has quite a few uses, from voice call moderation, to accessibility and more. We plan on continuing to develop the project to a more polished state, where it can be reliably used in other discord servers.
Join us for a hackathon where we will be using OpenAI Whisper to create innovative solutions! Whisper is a neural net that has been trained to approach human level robustness and accuracy for English speech recognition. We will be using this tool to create applications that can transcribe in multiple languages, as well as translate from those languages into English. This will be a great opportunity for you to learn more about speech processing and to create some useful applications!