Fatos Halimi
September 28, 2022

Free Speech-to-Text AI Tool for Transcription Using Google Colab

Posted on September 28, 2022  •  3 minutes  • 436 words
Table of contents

Whisper is OpenAI’s automatic speech recognition (ASR) tool, trained on 680,000 hours of labeled data from the internet. It’s available for free on GitHub , providing an accessible way to transcribe audio and translate it into other languages.

In this post, we’ll look at how to use Whisper and Google Colab to easily and freely transcribe an interview recording.

What is Google Colab?

Google Colab is a cloud-based tool for writing and running code—think Google Docs but for Python coding. It’s especially useful if you don’t have a powerful computer at home.

Using Whisper with Google Colab

  1. Create a Google Colab Notebook
    Open Google Colab to start a new notebook. You can also go to Google Drive, right-click, select “More,” and choose Google Colaboratory.


    The notebook will be named “Untitled.ipynb” by default.

  2. Enable GPU
    Next, enable the GPU in Colab for faster processing. Go to “Runtime” > “Change runtime type,” select “GPU” under Hardware Accelerator, and save.

  3. Install Whisper
    In a new code cell, paste the following lines and run them using the play button (or with CTRL + Enter):

    !pip install git+https://github.com/openai/whisper.git 
    !sudo apt update && sudo apt install ffmpeg
    


    This installs Whisper in the notebook. (For further setup options, see Whisper’s GitHub setup page .)

    The ! symbol at the start of each line indicates that these commands are shell scripts, not Python code. If you’re running Whisper on your own computer, you can omit the !.

  4. Upload Audio File
    To transcribe audio, upload it to Google Colab. You can drag and drop the file into the Colab file explorer or use the upload button.

  5. Transcribe with Whisper
    Add a second code cell, click “+ Code,” and paste the following:

    !whisper "test_interview.m4a" --model medium --language German
    

    Hit the play button (or CTRL + Enter) to run the code. Here, I used the Medium model with German as the language parameter. The transcribed files will appear in the file explorer once the transcription is complete.

Whisper Models

Whisper offers several model options for different levels of accuracy and speed. The default is the Small model, which is faster but less accurate. For more accurate transcriptions, use a larger model.

A model is a statistical representation of the Speech-to-Text engine trained to recognize and convert spoken language into text. Different models are optimized for various tasks.

Whisper Command-Line Options

To view all Whisper options in Colab:

!whisper -h

Summary

We’ve covered the basics of transcribing with Google Colab and Whisper by running the transcription tool in Google Colab’s command line.

Happy Transcribing!

Resources