Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Key Facts

Over half of the world's population speaks more than one language

Code-switching is a natural part of everyday communication for many bilingual speakers

There has been little work focused on how voice agents handle code-switched speech in enterprise settings

Automatic speech recognition (ASR) is the first step in any voice agent pipeline

Transcription errors in ASR propagate forward into subsequent processing steps

Hugging Face built a benchmark and dataset to evaluate models on code-switched speech

The benchmark focuses on evaluating the performance of voice agents on bilingual customer bases

The Code-Switching Conundrum

More than half of the world’s population speaks more than one language. Code-switching, the practice of switching between languages in a single conversation, is a natural part of everyday communication for many bilingual speakers. However, voice agents struggle with code-switched speech, and this is a significant issue. Automatic speech recognition, or ASR, is the first step in any voice agent pipeline, and transcription errors in ASR can propagate forward into subsequent processing steps. This can lead to frustrated customers, compromised security, and a host of other problems.

The lack of research on code-switched speech in voice agents is a notable gap, given the widespread use of voice agents in customer service and other applications. To address this challenge, Hugging Face built a benchmark and dataset to evaluate models on code-switched speech. This benchmark focuses on evaluating the performance of voice agents on bilingual customer bases. As the Hugging Face team noted, code-switching is a common phenomenon in multilingual communities, and it’s essential to develop voice agents that can handle this type of speech. The development of this benchmark was prompted by a customer inquiry about voice agent performance with bilingual customers, highlighting the real-world need for more inclusive voice agents.

Under the Hood

The benchmarking process involved creating a dataset of code-switched speech and evaluating the performance of Frontier ASR, a state-of-the-art ASR model, on this dataset. Researchers used a combination of manual transcription and automated evaluation metrics to assess the performance of the model. The results showed that while Frontier ASR performed well on monolingual speech, it struggled with code-switched speech. For instance, the model’s transcription accuracy was significantly lower when dealing with code-switched speech. This demonstrates the complexity of this challenge. The key findings of the study highlight the challenges of handling code-switched speech in voice agents. Transcription errors in ASR can have a significant impact on the overall performance of the voice agent, and code-switching can exacerbate these errors.

Developers and researchers can take several steps to address this gap. They can evaluate ASR models using the Hugging Face benchmark, develop more inclusive ASR models, improve transcription accuracy, collect and annotate more code-switched speech data, and collaborate with linguists and language experts. Improving transcription accuracy is essential to preventing errors and ensuring the security and integrity of customer interactions. Techniques such as active learning or human-in-the-loop feedback can be used to improve transcription accuracy and reduce errors. By working together to address this challenge, we can create more effective and inclusive voice agents that meet the needs of multilingual users.

The Impact

The impact of this research is significant, as it highlights the need for more inclusive and effective voice agents that can handle the complexities of code-switched speech. The study’s findings have implications for the development of voice agents in enterprise settings, particularly in contact centers and IT helpdesks that serve largely bilingual customer bases. As voice agents become increasingly ubiquitous in these settings, the ability to handle code-switched speech will be crucial to preventing errors and ensuring the security and integrity of customer interactions. The consequences of inaccurate voice agents can be severe, ranging from frustrated customers to compromised security.

Taking Action

To address the challenges of code-switched speech in voice agents, developers and researchers must take immediate action. This includes using the Hugging Face benchmark to evaluate the performance of ASR models on code-switched speech, developing more inclusive ASR models, and improving transcription accuracy. Collecting and annotating more code-switched speech data is essential to further improving the performance of ASR models on this type of speech. Collaboration with linguists and language experts is also crucial to better understanding the complexities of code-switched speech and developing more effective solutions. By taking these steps, we can create more effective and inclusive voice agents that meet the needs of multilingual users.

Sources

Tags: Code

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Key Facts

The Code-Switching Conundrum

Under the Hood

The Impact

Taking Action

Sources

Sources & References

Security Digest

Related Articles

Privacy-Preserving Outsourced Witness Updates for Append-Only RSA Accumulators Security Research

Google Shor's Algorithm Obfuscation Broken: LLM Crowdsourcing Outperforms ZKP-Verified Benchmark by 44%

Spur adds no-code Cloudflare integration for Monocle Cloud Security

AutoJack Attack Lets One Web Page Hijack AI Agent for Host Code Execution Vulnerability

Related Articles

Research
Privacy-Preserving Outsourced Witness Updates for Append-Only RSA Accumulators Security Research

In this paper, we present a privacy-preserving outsourced witness-update protocol for append-only RSA accumulators. The protocol combines witness updates...
Jun 21, 2026

Research
Google Shor's Algorithm Obfuscation Broken: LLM Crowdsourcing Outperforms ZKP-Verified Benchmark by 44%

An open-source contest utilizing Large Language Models (LLMs) has successfully reverse-engineered and optimized Google Quantum AI's restricted Shor's algorithm circuit optimization, exceeding Google's obfuscated benchmark by 44.0%.
Jun 21, 2026

Cloud Security
Spur adds no-code Cloudflare integration for Monocle Cloud Security

"These updates ensure that customers can implement inline enforcement in minutes, gain deeper visibility into user behavior, and quickly translate those...
Jun 20, 2026

Vulnerabilities & Exploits
AutoJack Attack Lets One Web Page Hijack AI Agent for Host Code Execution Vulnerability

Microsoft made a similar localhost argument in its Semantic Kernel RCE research, tracked as CVE-2026-26030 and CVE-2026-25592. The issue is tracked as...
Jun 19, 2026