HalfDigit | AI Engineering

About this demo

This module demonstrates an automated pipeline for converting unstructured audio data into structured text. It leverages the OpenAI Whisper Large-v3 model via Hugging Face Inference API to transcribe speech with high accuracy.

Inputs: Raw audio files (.wav or .mp3).
Output: Full text transcription.
Tech Stack: Next.js Frontend → Python Flask API → Hugging Face Inference Cluster.

Input Source

System Constraints (Free Tier):

Formats: .wav, .mp3
Max Size: 5MB (Approx. 5 mins).
Note: Larger files are supported by the model but restricted here to prevent server timeouts.

Get sample files form Kaggle ↗

TRANSCRIPTION OUTPUT

Waiting for input stream...

Module 02: Audio Extraction Pipeline

About this demo

Input Source

Abhaya's AI