Whisper: AI Audio Transcription by OpenAI

Whisper: AI Audio Transcription by OpenAI Whisper: AI Audio Transcription by OpenAI

Disclaimer: This post has been translated to English using a machine translation model. Please, let me know if you find any mistakes.

Introductionlink image 5

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Using such a large and diverse dataset leads to greater robustness against accents, background noise, and technical language. Additionally, it allows for transcription in multiple languages as well as translation of those languages into English.

Website

Paper

GitHub

Model card

Installationlink image 6

To install this tool, it's best to create a new Anaconda environment.

	
< > Input
Python
!conda create -n whisper
Copied

We enter the environment

	
< > Input
Python
!conda activate whisper
Copied

We install all the necessary packages

	
< > Input
Python
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
Copied

Finally, we install whisper

	
< > Input
Python
!pip install git+https://github.com/openai/whisper.git
Copied

And we update ffmpeg

	
< > Input
Python
!sudo apt update && sudo apt install ffmpeg
Copied

Usagelink image 7

We import whisper

	
< > Input
Python
import whisper
Copied

We select the model, the larger it is, the better it will perform.

	
< > Input
Python
# model = "tiny"
# model = "base"
# model = "small"
# model = "medium"
model = "large"
model = whisper.load_model(model)
Copied

We load the audio from this old ad (from 1987) for Micro Machines

	
< > Input
Python
audio_path = "MicroMachines.mp3"
audio = whisper.load_audio(audio_path)
audio = whisper.pad_or_trim(audio)
Copied
	
< > Input
Python
mel = whisper.log_mel_spectrogram(audio).to(model.device)
Copied
	
< > Input
Python
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
Copied
>_ Output
			
Detected language: en
	
< > Input
Python
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
Copied
	
< > Input
Python
result.text
Copied
>_ Output
			
"This is the Micro Machine Man presenting the most midget miniature motorcade of micro machines. Each one has dramatic details, terrific trim, precision paint jobs, plus incredible micro machine pocket play sets. There's a police station, fire station, restaurant, service station, and more. Perfect pocket portables to take any place. And there are many miniature play sets to play with and each one comes with its own special edition micro machine vehicle and fun fantastic features that miraculously move. Raise the boat lift at the airport, marina, man the gun turret at the army base, clean your car at the car wash, raise the toll bridge. And these play sets fit together to form a micro machine world. Micro machine pocket play sets so tremendously tiny, so perfectly precise, so dazzlingly detailed, you'll want to pocket them all. Micro machines and micro machine pocket play sets sold separately from Galoob. The smaller they are, the better they are."

Continue reading

Last posts -->

Have you seen these projects?

Gymnasia

Gymnasia Gymnasia
React Native
Expo
TypeScript
FastAPI
Next.js
OpenAI
Anthropic

Mobile personal training app with AI assistant, exercise library, workout tracking, diet and body measurements

Horeca chatbot

Horeca chatbot Horeca chatbot
Python
LangChain
PostgreSQL
PGVector
React
Kubernetes
Docker
GitHub Actions

Chatbot conversational for cooks of hotels and restaurants. A cook, kitchen manager or room service of a hotel or restaurant can talk to the chatbot to get information about recipes and menus. But it also implements agents, with which it can edit or create new recipes or menus

View all projects -->
>_ Available for projects

Do you have an AI project?

Let's talk.

maximofn@gmail.com

Machine Learning and AI specialist. I develop solutions with generative AI, intelligent agents and custom models.

Do you want to watch any talk?

Last talks -->

Do you want to improve with these tips?

Last tips -->

Use this locally

Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.

Flow edit

Flow edit Flow edit

FLUX.1-RealismLora

FLUX.1-RealismLora FLUX.1-RealismLora
View all containers -->
>_ Available for projects

Do you have an AI project?

Let's talk.

maximofn@gmail.com

Machine Learning and AI specialist. I develop solutions with generative AI, intelligent agents and custom models.

Do you want to train your model with these datasets?

short-jokes-dataset

HuggingFace

Dataset with jokes in English

Use: Fine-tuning text generation models for humor

231K rows 2 columns 45 MB
View on HuggingFace โ†’

opus100

HuggingFace

Dataset with translations from English to Spanish

Use: Training English-Spanish translation models

1M rows 2 columns 210 MB
View on HuggingFace โ†’

netflix_titles

HuggingFace

Dataset with Netflix movies and series

Use: Netflix catalog analysis and recommendation systems

8.8K rows 12 columns 3.5 MB
View on HuggingFace โ†’
View more datasets -->