Whisper is an open-source automatic speech recognition (ASR) system from OpenAI that converts speech to text. It’s multilingual and multitask, supporting:
- Speech-to-text transcription in many languages
- Translation of non‑English speech to English text
- Language identification
Whisper uses a Transformer encoder–decoder architecture trained on a large and diverse audio–text corpus, making it robust across accents, noise conditions, and technical domains. You can read all about it here
There are a number of different model sizes that trade accuracy for speed and resource use. Common variants include tiny (87MB), small (497MB), medium (1.83GB), and large (3.39). Larger models are generally more accurate but significantly slower and more resource-intensive. In our testing we found the “small” model to provide a strong balance between accuracy and performance for most deployments.
Whisper is distributed as a single file. This has a special format and is called a “Llamafile,” which runs on Linux, Windows, and macOS without separate installs. One file does it all. You will need to download Whisper from here -> Mozilla whisperfile on Hugging Face
Note:
- Windows: after download, add the .exe extension to the file (e.g., whisper-small.llamafile.exe) so it can run.
- Linux/macOS: make the file executable:
- chmod +x whisper-small.llamafile
- Then run it from the terminal.
You can deploy Whisper on your Domino server or on a separate machine to leverage optimised hardware (e.g., a macOS system with Apple Silicon).
Preemptive AI connects to Whisper via HTTP. By default only local (on the same server) connections are possible, so you may need to change this if you are running Whisper on a different server from your Domino Server. To change the port or listen to all interfaces, specify the host (0.0.0.0 makes the server publicly accessible) and port at launch:
Example:
./whisper-small.llamafile --host 0.0.0.0 --port 8080 (Linux/macOS)
whisper-small.llamafile.exe --host 0.0.0.0 --port 8080 (Windows)
Note: Binding to 0.0.0.0 exposes the service on your network. Place it behind your firewall or restrict access (e.g., reverse proxy, allowlist, or VPN).
There are many additional command-line arguments you can pass Whisper; some of them are documented here -> ggml-org/whisper.cpp examples/cli
Performance tips:
- On Apple Silicon, use the appropriate build that leverages Metal and include the command line argument --gpu metal
- Adjust the number of threads Whisper can use based on your server’s hardware.