Mati Staniszewski, co-founder and CEO of ElevenLabs, emphasizes that voice is emerging as the primary interface for artificial intelligence, revolutionizing the way humans engage with machines beyond traditional text and screens.
During his presentation at the Web Summit in Doha, Staniszewski shared insights with TechCrunch, noting that voice models created by ElevenLabs have evolved from merely imitating human speech to incorporating emotional nuances and intonation. This advancement allows these models to collaborate effectively with the reasoning capabilities of large language models, marking a significant transformation in technology interaction.
Looking ahead, Staniszewski envisions a future where smartphones are tucked away, enabling users to engage with their surroundings while utilizing voice to control technology. This innovative perspective contributed to ElevenLabs securing a remarkable $500 million investment this week, elevating the company's valuation to $11 billion. This vision is increasingly resonating across the AI landscape, with companies like OpenAI and Google prioritizing voice in their upcoming models, while Apple quietly develops voice-centric technologies through strategic acquisitions.
As AI continues to integrate into various devices, the focus is shifting from screen interactions to voice commands, establishing voice as a pivotal element in the next phase of AI evolution. Seth Pierrepont, a general partner at Iconiq Capital, echoed this sentiment at the Web Summit, highlighting that while screens remain vital for entertainment, traditional input methods are becoming outdated.
As AI systems gain more autonomy, Pierrepont noted that user interactions will evolve, requiring less explicit instructions as models become more context-aware. Staniszewski pointed out that this shift towards agentic systems is one of the most significant changes currently underway. Future voice technologies will increasingly depend on memory and context accumulated over time, making user interactions feel more intuitive and effortless.
Staniszewski elaborated on how this evolution will reshape the deployment of voice models. While high-quality audio processing has primarily occurred in the cloud, ElevenLabs is pursuing a hybrid model that integrates cloud and on-device processing. This approach aims to support new hardware, such as headphones and wearables, making voice a continuous companion rather than a feature activated at will.
ElevenLabs is collaborating with Meta to integrate its voice technology into platforms like Instagram and Horizon Worlds, the company's virtual reality environment. Staniszewski expressed interest in potentially working with Meta on its Ray-Ban smart glasses as voice-driven interfaces expand into diverse formats.
However, as voice technology becomes more ingrained in everyday devices, it raises significant concerns regarding privacy and data security, particularly as these systems become more intertwined with users' daily lives.