Hindi Wake Words Dataset – 10 Hours

The Hindi Wake Words Dataset is a high-quality speech dataset designed specifically for training wake word detection systems in Hindi. It features 10 hours of audio data, recorded with diverse accents, genders, and environments to provide robust support for AI and speech recognition models.

Hindi

Number of Participants

30 Total Hours

30 Last Updated

January 8, 2025

Overview

The Hindi Wake Words Dataset is a high-quality speech dataset designed specifically for training wake word detection systems in Hindi. It features 10 hours of audio data, recorded with diverse accents, genders, and environments to provide robust support for AI and speech recognition models.

Key Features

Total Audio Duration: 10 hours of high-quality recordings.
Wake Words Included:
- Namaste
- Suno
- Shuru Karo
- Jai Hind
Diversity:
- Speakers:
  - Age Groups: 18–60 years.
  - Balanced gender representation: 50% male, 50% female.
- Accents: Regional Hindi variations from North, West, East, and South India.
Environmental Variability:
- Indoor and outdoor recordings.
- Quiet and noisy environments, including marketplaces, offices, and traffic zones.
File Format:
- WAV audio files.
- Sampling Rate: 16 kHz.
- Bit Depth: 16-bit.
Annotations:
- Each audio file is accompanied by metadata, including wake word type, speaker information, and recording conditions.

Hindi Wake Words Dataset – 10 Hours

Hindi Wake Words Dataset – 10 Hours

Category

Hindi

Number of Participants

30

Total Hours

30

Last Updated

January 8, 2025

Overview

Key Features

Get In Touch

info@indianspeechdatasets.com