Hindi Wake Words Dataset – 10 Hours 

Hindi Wake Words Dataset – 10 Hours 

The Hindi Wake Words Dataset is a high-quality speech dataset designed specifically for training wake word detection systems in Hindi. It features 10 hours of audio data, recorded with diverse accents, genders, and environments to provide robust support for AI and speech recognition models.

Category

Hindi

Number of Participants

30

Total Hours

30

Last Updated

January 8, 2025

Overview

The Hindi Wake Words Dataset is a high-quality speech dataset designed specifically for training wake word detection systems in Hindi. It features 10 hours of audio data, recorded with diverse accents, genders, and environments to provide robust support for AI and speech recognition models.


Key Features

  1. Total Audio Duration: 10 hours of high-quality recordings.
  2. Wake Words Included:
    • Namaste
    • Suno
    • Shuru Karo
    • Jai Hind
  3. Diversity:
    • Speakers:
      • Age Groups: 18–60 years.
      • Balanced gender representation: 50% male, 50% female.
    • Accents: Regional Hindi variations from North, West, East, and South India.
  4. Environmental Variability:
    • Indoor and outdoor recordings.
    • Quiet and noisy environments, including marketplaces, offices, and traffic zones.
  5. File Format:
    • WAV audio files.
    • Sampling Rate: 16 kHz.
    • Bit Depth: 16-bit.
  6. Annotations:
    • Each audio file is accompanied by metadata, including wake word type, speaker information, and recording conditions.

Get In Touch

info@indianspeechdatasets.com

Scroll to Top