The Hindi Wake Words Dataset is a high-quality speech dataset designed specifically for training wake word detection systems in Hindi. It features 10 hours of audio data, recorded with diverse accents, genders, and environments to provide robust support for AI and speech recognition models.
Key Features
- Total Audio Duration: 10 hours of high-quality recordings.
- Wake Words Included:
- Namaste
- Suno
- Shuru Karo
- Jai Hind
- Diversity:
- Speakers:
- Age Groups: 18–60 years.
- Balanced gender representation: 50% male, 50% female.
- Accents: Regional Hindi variations from North, West, East, and South India.
- Speakers:
- Environmental Variability:
- Indoor and outdoor recordings.
- Quiet and noisy environments, including marketplaces, offices, and traffic zones.
- File Format:
- WAV audio files.
- Sampling Rate: 16 kHz.
- Bit Depth: 16-bit.
- Annotations:
- Each audio file is accompanied by metadata, including wake word type, speaker information, and recording conditions.