Module 02: Audio Extraction Pipeline

Status: Active | Endpoint: /speech-to-text

About this demo

This module demonstrates an automated pipeline for converting unstructured audio data into structured text. It leverages the OpenAI Whisper Large-v3 model via Hugging Face Inference API to transcribe speech with high accuracy.

  • Inputs: Raw audio files (.wav or .mp3).
  • Output: Full text transcription.
  • Tech Stack: Next.js Frontend → Python Flask API → Hugging Face Inference Cluster.

Input Source

System Constraints (Free Tier):

  • Formats: .wav, .mp3
  • Max Size: 5MB (Approx. 5 mins).
  • Note: Larger files are supported by the model but restricted here to prevent server timeouts.

Click or Drag audio here

TRANSCRIPTION OUTPUT

Waiting for input stream...