About this demo
This module demonstrates an automated pipeline for converting unstructured audio data into structured text. It leverages the OpenAI Whisper Large-v3 model via Hugging Face Inference API to transcribe speech with high accuracy.
- Inputs: Raw audio files (.wav or .mp3).
- Output: Full text transcription.
- Tech Stack: Next.js Frontend → Python Flask API → Hugging Face Inference Cluster.
Input Source
System Constraints (Free Tier):
- Formats: .wav, .mp3
- Max Size: 5MB (Approx. 5 mins).
- Note: Larger files are supported by the model but restricted here to prevent server timeouts.
TRANSCRIPTION OUTPUT
Waiting for input stream...