Files

T

OwusuBlessing cebb4c2fe0 made first commit

2025-07-29 01:52:11 +01:00

5.1 KiB

Raw Blame History

VoiceIntent – Intelligent Audio Command Understanding System

VoiceIQ is a modular speech-to-intent classification system that processes raw voice input, transcribes it using a pre-trained ASR model, and classifies user intent into structured components (action, object, location). It's designed for use in smart assistants, hands-free interfaces, and voice-based automation systems.

Project Goals

✅ Build an end-to-end speech pipeline using ASR + NLP
✅ Classify spoken commands into structured intents
✅ Serve predictions via a clean API or UI
✅ Ensure modularity and production readiness

Core Features

Feature	Description
Speech-to-text	Uses OpenAI Whisper for transcription
Intent classifier	Classifies transcribed text into `action`, `object`, `location`
Evaluation pipeline	Tracks WER, accuracy, precision, recall, and confusion matrices
CLI pipelines	One-command training, inference, and evaluation
API + UI	FastAPI for RESTful endpoints; Streamlit demo included
Notebooks	EDA, ASR error analysis, intent confusion reports

Project Structure

Dataset: Fluent Speech Commands

23,132 single-sentence voice commands (1–2 seconds)
Labels: action, object, location
Examples:
- “Turn on the lights in the kitchen” → activate, lights, kitchen
- “Switch off the fan in the bedroom” → deactivate, fan, bedroom

Setup Instructions

1. Create Environment

git clone https://github.com/your-org/voiceiq.git
cd voiceiq
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt

2. Download Dataset

# Download and unzip Fluent Speech Commands
# Or use torchaudio.datasets.FluentSpeechCommands if available

How to Run Pipelines

1. Train Pipeline

python pipelines/train_pipeline.py --config configs/train_config.yaml --experiment_name "whisper+bert_baseline"

2. Inference Pipeline

# Predict from a WAV file
python pipelines/inference_pipeline.py --model_path models/best.pt --audio_file data/test_audio/command.wav

3. Evaluation Pipeline

python pipelines/evaluation_pipeline.py --model_path models/best.pt --test_data data/processed/test.csv

API & UI Demos

FastAPI Server

uvicorn src.api.server:app --reload

POST /predict — Upload audio and get predicted intent
GET /health — System health check

Streamlit Demo

python run_demo.py

Upload .wav or record live
View transcript, structured intent, and confidence scores

Metrics Tracked

Metric	Description
WER	Word Error Rate from ASR
Intent Acc	Accuracy for full `(action, object, location)` triplet
F1 Scores	Macro, micro, and per-label F1
Confusion Matrix	Action/Object/Location classification errors

Included Notebooks

Notebook	Purpose
`01_audio_exploration.ipynb`	Visualize waveforms, mel spectrograms
`02_asr_error_analysis.ipynb`	Compare Whisper vs Wav2Vec2 transcriptions
`03_intent_classification.ipynb`	Hyperparameter tuning, misclassification review
`04_results_analysis.ipynb`	Plot confusion matrices and F1 breakdowns

Models Used

Whisper Base (ASR) – Robust transcription of short commands
DistilBERT / BERT – Text classification of transcripts or any of your choice
Optionally: Fine-tune Whisper for joint ASR+intent learning

Design Decisions

Pipeline Modularity: All components (ASR, NLP, evaluation) are swappable
Config-Driven: Use YAML configs for training, ASR models, and evaluation
Separation of Concerns: Clean division between preprocessing, training, and inference

Potential Extensions

✅ Real-time streaming inference
✅ Speaker identification and voice embeddings
✅ End-to-end fine-tuning of Whisper for direct audio → intent
✅ Multilingual support via Whisper large models
✅ Deployable microservice with Docker

Example Usage

Voice Input:

"Turn on the fan in the bedroom"

Output:

{
  "transcript": "turn on the fan in the bedroom",
  "intent": {
    "action": "activate",
    "object": "fan",
    "location": "bedroom"
  },
  "confidence": {
    "action": 0.98,
    "object": 0.95,
    "location": 0.93
  }
}

5.1 KiB Raw Blame History Unescape Escape