Comparative analysis of Whisper and Deepgram STT models

Setup and installation instructions.

To run inference scripts transcription_whisper.py and transcription_deepgram.py, kindly ensure that:

Python 3.11 or later is installed on your system;
All necessary libraries listed in requirments.txt are installed in your environment;
Input audio file and ground truth reference text were obtained from Common Voice Delta Segment 19.0 dataset. Dataset contains audio files in mp3 format and transcriptions in tsv format. For the purpose of test task, mp3 was converted to wav programmatically.
In order to process your audio file(s), a correct name and path have to be specified along with ground truth reference text file(s).

Note

Whisper transcription script will automatically convert audio sample rate to 16000 Hz to meet inference requirments.

An explanation of The metrics logged (latency, WER).

The log file metrics.log is structured in the way to allow user review historical transcription results for both Whisper and Deepgram models. Those results include latency and word error rate along with transcribed text. Latency is defined as time lapsed while the main transcription function was being executed. For Deepgram, it also includes websocket connection. Word error rate is defined with the following formula:

WER = (S + D + I)/(S + D + C),

where: S is the number of substitutions (i.e. 'Dolly’ vs the actual text 'DALL·E’) D is the number of deletions (i.e. 'I speech-to-text’ vs the actual text 'I like speech-to-text’) I is the number of insertions (i.e. 'I really like speech-to-text’ vs the actual text 'I like speech-to-text’) C is the number of correctly predicted words.

References

Kindly consult official Whisper's and Deepgram's documentation for an extra info

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
LICENSE		LICENSE
README.md		README.md
ground_truth.txt		ground_truth.txt
input_audio.wav		input_audio.wav
metrics.log		metrics.log
requirements.txt		requirements.txt
transcription_deepgram.py		transcription_deepgram.py
transcription_whisper.py		transcription_whisper.py
transription_deepgram.txt		transription_deepgram.txt
transription_whisper.txt		transription_whisper.txt
whitepaper.pdf		whitepaper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Comparative analysis of Whisper and Deepgram STT models

Setup and installation instructions.

An explanation of The metrics logged (latency, WER).

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MAINSUP/STT

Folders and files

Latest commit

History

Repository files navigation

Comparative analysis of Whisper and Deepgram STT models

Setup and installation instructions.

An explanation of The metrics logged (latency, WER).

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages