Automatic Speech Recognition

Overview

This project combines three AI capabilities: speech-to-text recognition, emotion detection from audio, and automatic text summarization. The system is designed to work with diverse accents in challenging acoustic environments and interpret emotional content from spoken language.

Key Features

  • Speech Recognition: Conformer model for ASR (Automatic Speech Recognition)
  • Emotion Detection: XGBoost for Speech Emotion Recognition (SER)
  • Text Summarization: BART Large for automatic text summarization
  • Designed for diverse accents in challenging acoustic environments

Technologies

  • Models: Conformer, XGBoost, BART Large
  • Training Data: Established speech datasets

Use Cases

  • Customer service analytics
  • Meeting transcription and analysis
  • Media content analysis
  • Educational applications

View on GitHub