AI - Audio
AI - Audio
The field of Artificial Intelligence (AI) is particularly exciting for us because it is rapidly evolving and the demand is increasing in various domains. We have the opportunity to work innovatively and offer solutions that have not existed before. In the audio domain, we work with speech-to-text solutions, transcription, and translations for various industries. The biggest challenges lie in the language itself, which is organic and not always logical - here, AI needs to learn to think 'humanly'.
Translation
Personalized Audiobooks
Translation App
For a transportation company, we are developing an app that translates spoken conversations into audio in another language. The app is used for internal phone calls between employees from different countries or during customer conversations with language barriers. The app translates conversations live, such as from English to German or German to French. Some industry-specific phrases are not found in dictionaries, so we turn them into predefined messages (PDMs) that the AI translates according to our specifications. During development, the app generates audio files from all conversations, which are then analyzed by linguists in an analysis system. They correct the texts and improve the AI, which also learns to identify dialects. The app determines the source language through speech recognition. We use Azure's infrastructure to ensure that the app is always available. Additionally, Azure provides the appropriate security policies and GDPR-compliant infrastructure in Germany, as voice recordings are personal data and individual files may need to be deleted upon request.
Personalized Audiobooks
There are also many use cases for AI solutions in the audio domain for direct consumer products. That's why we built the product NarrAItor - Personalized Audiobooks. Sometimes, a parent or grandparent cannot be present to read a story to a child. However, personalized audiobooks make it possible to have entire books read in the voice of a specific person, without them actually recording the reading. We only need an audio file of 30 to 60 seconds in length to generate a voice that closely resembles the original. We continuously work on improving the quality of our audiobooks through model training and incorporating new models from ElevenLabs. The product itself is driven by a fast iteration and development cycle, utilizing a base AI repository with React, TypeScript, and Next.js, as well as an automated and scalable infrastructure with GitHub and Vercel. Try Personalized Audiobooks
Our location
KöpenickerAufgang 1
10179 Berlin