Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech
The Code-Switching Conundrum
More than half of the world’s population speaks more than one language. Code-switching, the practice of switching between languages in a single conversation, is a natural part of everyday communication for many bilingual speakers. However, voice agents struggle with code-switched speech, and this is a significant issue. Automatic speech recognition, or ASR, is the first step in any voice agent pipeline, and transcription errors in ASR can propagate forward into subsequent processing steps. This can lead to frustrated customers, compromised security, and a host of other problems.
The lack of research on code-switched speech in voice agents is a notable gap, given the widespread use of voice agents in customer service and other applications. To address this challenge, Hugging Face built a benchmark and dataset to evaluate models on code-switched speech. This benchmark focuses on evaluating the performance of voice agents on bilingual customer bases. As the Hugging Face team noted, code-switching is a common phenomenon in multilingual communities, and it’s essential to develop voice agents that can handle this type of speech. The development of this benchmark was prompted by a customer inquiry about voice agent performance with bilingual customers, highlighting the real-world need for more inclusive voice agents.
Under the Hood
The benchmarking process involved creating a dataset of code-switched speech and evaluating the performance of Frontier ASR, a state-of-the-art ASR model, on this dataset. Researchers used a combination of manual transcription and automated evaluation metrics to assess the performance of the model. The results showed that while Frontier ASR performed well on monolingual speech, it struggled with code-switched speech. For instance, the model’s transcription accuracy was significantly lower when dealing with code-switched speech. This demonstrates the complexity of this challenge. The key findings of the study highlight the challenges of handling code-switched speech in voice agents. Transcription errors in ASR can have a significant impact on the overall performance of the voice agent, and code-switching can exacerbate these errors.
Developers and researchers can take several steps to address this gap. They can evaluate ASR models using the Hugging Face benchmark, develop more inclusive ASR models, improve transcription accuracy, collect and annotate more code-switched speech data, and collaborate with linguists and language experts. Improving transcription accuracy is essential to preventing errors and ensuring the security and integrity of customer interactions. Techniques such as active learning or human-in-the-loop feedback can be used to improve transcription accuracy and reduce errors. By working together to address this challenge, we can create more effective and inclusive voice agents that meet the needs of multilingual users.
The Impact
The impact of this research is significant, as it highlights the need for more inclusive and effective voice agents that can handle the complexities of code-switched speech. The study’s findings have implications for the development of voice agents in enterprise settings, particularly in contact centers and IT helpdesks that serve largely bilingual customer bases. As voice agents become increasingly ubiquitous in these settings, the ability to handle code-switched speech will be crucial to preventing errors and ensuring the security and integrity of customer interactions. The consequences of inaccurate voice agents can be severe, ranging from frustrated customers to compromised security.
Taking Action
To address the challenges of code-switched speech in voice agents, developers and researchers must take immediate action. This includes using the Hugging Face benchmark to evaluate the performance of ASR models on code-switched speech, developing more inclusive ASR models, and improving transcription accuracy. Collecting and annotating more code-switched speech data is essential to further improving the performance of ASR models on this type of speech. Collaboration with linguists and language experts is also crucial to better understanding the complexities of code-switched speech and developing more effective solutions. By taking these steps, we can create more effective and inclusive voice agents that meet the needs of multilingual users.
Sources
Automated and analyst-reviewed threat intelligence briefings tracking active exploitation campaigns, CVE disclosures, and extortion group activity.
Security Digest
Get the latest cybersecurity news, vulnerability alerts, and threat intelligence delivered to your inbox.
Related Articles
Microsoft June 2026 Security Updates
Microsoft's Urgent Security Update Microsoft has just released a massive security update, fixing 204 vulnerabilities, including 38 critical ones. This is a...
AI/ML SecurityFOCUS specification eyes AI token economics as AI billing complexity hits a new frontier
The Emergence of AI Token Economics: A Data Normalization Crisis The rapid evolution of AI token economics is creating a data normalization crisis. This...
Vulnerabilities & ExploitsSAP fixes critical flaws in NetWeaver and Commerce Cloud
Uncovering Critical Flaws in SAP NetWeaver and Commerce Cloud SAP's June 2026 Security Patch package is a big deal. It fixes 15 vulnerabilities, including...
AI/ML SecurityIs OpenAI Lockdown Mode an Admission of Risk? Enough?
As AI-powered chatbots expand across customer service, technical support, and enterprise workflows, they become increasingly attractive targets for attackers seeking to extract sensitive data.