DYNAMIC ADAPTATION OF SPEECH SYNTHESIS BY AN AUTOMATED ASSISTANT DURING AUTOMATED TELEPHONE CALL(S)

Inventiv.org

July 29, 2025

Invented by Goldshtein; Sasha

Automated assistants are everywhere today, but they often sound stiff and robotic. Imagine if these assistants could change their voices during a call to better match the person or system they are speaking to. This patent application introduces a smart way for automated assistants to pick and switch voices on the fly, making them sound more human and helpful. Let’s explore how this technology works, why it matters, and what makes it new.

Background and Market Context

People use automated assistants for many things, like making calls, booking reservations, or checking information. These assistants might live on your phone, a smart speaker, or even in your car. When you ask them to do something, they often need to call a business or another person on your behalf. Most of the time, the voice you hear from the assistant doesn’t change. It uses the same tone, accent, and speed, no matter who is on the other end.

The problem is, a single robotic voice can sound odd or even annoying. If the assistant calls a restaurant in New York but sounds like it is from California, the conversation can feel awkward. Some people might not even realize they’re talking to a machine, but when they do, they might feel uncomfortable or frustrated. This can cause problems, like failed reservations or misunderstandings.

Voice technology is growing fast. Companies want automated assistants to sound more natural and be more useful. People expect these assistants to understand them and respond in a way that feels friendly and familiar. Businesses that use automated assistants hope to make things faster and easier for customers. But if the voice is jarring or hard to understand, it can ruin the experience.

There’s also a growing need for assistants to work with both people and other machines. Many businesses use interactive voice response (IVR) systems or voice bots. These systems are designed to talk to humans, but now more machines are talking to each other. An assistant calling another bot might need to speak differently than when talking to a person.

In short, the market wants automated assistants that can:

— Sound more human and less robotic

— Change the way they speak depending on who they are talking to

— Make calls smoother, faster, and more successful

— Save time and resources for both users and businesses

The solution described in this patent is aimed right at these needs. By dynamically changing voices and speech styles, an assistant can fit in better, make people more comfortable, and get more things done without hiccups.

Scientific Rationale and Prior Art

Let’s look at how voice assistants have worked up to now. Most automated assistants use a technology called text-to-speech (TTS). TTS systems take text and turn it into spoken words. Early TTS models sounded very robotic. Recent advances make voices sound more human, with better intonation, rhythm, and emotion. But even with these improvements, most assistants still use just one voice for each user or device.

Some systems let users pick from a few preset voices, like male or female, or different accents. Once you pick a voice, though, it stays the same throughout the conversation. This means if the assistant starts a call with one voice, it keeps that voice until the call ends. If the voice doesn’t match the person or system on the other end, the conversation can feel mismatched.

Researchers have tried to improve TTS by making the speech sound more natural. They use machine learning models trained on lots of human speech. Some systems can change things like speed, pitch, or emphasis. A few advanced assistants can adjust their responses based on the conversation’s mood. But, these changes are usually small and not dynamic enough to switch between totally different voices mid-call.

There are also systems that try to guess the best way to say a name or address, especially if it’s hard to pronounce. For example, some assistants will spell out a tricky name letter by letter. Others might slow down or pause so the listener can understand. But, these features are usually hard-coded and only work in special cases.

So, what’s missing? Most past systems:

— Use one static voice per call, chosen before the call starts

— Don’t listen to or analyze the other caller’s voice to adjust their own

— Don’t switch voices if the conversation changes, like when a call is handed from an IVR to a human

— Can’t dynamically adapt speech style for different tasks or listeners

This patent moves past these limits. It describes a system where the assistant can start with one voice, listen to the other side, and switch to a different voice that fits better. It can match accents, change speech patterns, and even pick a new TTS model mid-call. It also adapts how it says hard words or personal information, like spelling out names if needed, or adding pauses when talking to a human. The system can tell if it’s talking to a person or another machine and adjust on the fly.

This is a big step forward. Instead of sounding the same every time, the assistant becomes more flexible and responsive. It can make calls feel smoother, less awkward, and more successful.

Invention Description and Key Innovations

Let’s break down what this invention does and what makes it special.

1. Starting the Call: Picking the First Voice

When the assistant is about to make a call, it chooses a starting voice. This choice can depend on things like:

— The type of business or person being called (like a restaurant or a bank)

— Where the business is located (for example, a pizzeria in New York vs. one in Texas)

— Whether the call is to a landline or a mobile phone

The assistant uses this information to pick a voice that will likely sound most “normal” to the person or system on the other end. For example, if calling a business in the Midwest, it might use a voice with a Midwestern accent.

2. Listening and Analyzing During the Call

Once the call starts, the assistant doesn’t just talk — it listens. It analyzes the other caller’s voice, accent, word choice, and even the type of system (human or bot). The assistant watches for clues, like:

— Does the person have a strong local accent?

— Is the greeting coming from a human, an IVR, or another bot?

— Has the conversation been handed off from a machine to a person, or vice versa?

The system uses machine learning to make sense of these clues. It can detect the language, accent, and rhythm of the other side. If it realizes the starting voice isn’t the best match, it can decide to switch.

3. Switching Voices Mid-Call

Here’s where the magic happens. If the assistant figures out that a different voice would work better, it can switch — even in the middle of the call. The switch can happen:

— Before the assistant says anything (for example, after hearing the business’s greeting)

— After the first response, if it turns out the other side is different than expected

— When the call is passed from a bot to a human, or between different people

The assistant doesn’t just change the sound of the voice. It can use a completely different TTS model. Each voice can have its own way of speaking, with different intonation, speed, and emphasis. For example, it might switch from a slow, clear voice for an IVR to a warmer, friendlier voice for a human.

4. Adapting Unique Personal Identifiers

Sometimes, the assistant needs to say hard-to-pronounce words, like unusual names, emails, or codes. This can be tricky, especially if the listener doesn’t understand right away. The invention adds a smart way to decide how to say these “unique personal identifiers.”

The assistant looks at things like:

— How common the name or identifier is (Is “John” more common than “Carlsen”?)

— How long or complicated it is (Is it just a name, or a mix of letters and numbers?)

— Who is listening (Is it a human or a bot?)

Based on this, the assistant may:

— Spell out the name or code, letter by letter, if it’s rare or tricky

— Say it all at once, if it’s common or simple

— Add pauses between letters or words, especially if talking to a person

— Skip pauses when talking to another machine, since bots can process speech faster

This helps avoid confusion. If the person on the other end would have trouble understanding, the assistant slows down and spells things out. If it’s talking to a bot, it goes faster and skips the extra steps.

5. Handling Different Use Cases

The system is flexible. It works for:

— Calls started by a user, like booking a table or checking store hours

— Calls started by the assistant itself, like checking on product availability after seeing a spike in user queries

— Cloud-based assistants that handle lots of calls at once, or local assistants that work just for one user

After the call, the assistant can also send a notification to the user about the result, or update a database if it was working on behalf of a group of users.

6. Technical Summary: How It Works

The system is made up of several parts:

— Engines that handle user input, voice selection, conversation analysis, and voice modification

— Databases of voices, prosodic properties (like speed and tone), and unique personal identifiers

— Machine learning models that can analyze speech, detect accents, and decide when to switch voices or speech styles

— Logic to pick the best voice, adjust as needed, and render the right speech in real time

All these parts work together to make the assistant more responsive and natural. The system can run on the user’s device, in the cloud, or both.

Key Innovations

What sets this invention apart?

1. Dynamic Voice Switching: The assistant can pick a new voice mid-call, based on live analysis of the conversation.

2. Context Awareness: It listens for cues about who is on the other side, what their accent or style is, and what stage of the call it is in.

3. Tailored Speech for Names and Codes: The assistant can decide how to say tricky words, spell them out, or add pauses, depending on the listener and the word.

4. Seamless Integration: The system works with both people and machines, and can run locally or in the cloud.

5. Resource Efficiency: By making conversations smoother, it helps complete tasks faster, saving time and computing power.

Put simply, this patent brings a new level of adaptability to automated assistants. It lets them sound less like robots and more like real people, while also making sure they get the job done right.

Conclusion

Automated assistants are becoming a bigger part of our daily lives. But for them to be truly helpful, they need to talk like us, understand us, and adapt as conversations change. The technology described in this patent application is a big step in that direction. By letting assistants pick and switch voices, adapt their speech to the person or system on the other end, and handle tricky words in smarter ways, this invention makes automated calls feel more natural and less frustrating.

As businesses and users demand more from their digital helpers, innovations like this will help bridge the gap between humans and machines. Soon, you might not even notice when you’re talking to an assistant — because it will sound just right, every time.

Click here https://ppubs.uspto.gov/pubwebapp/ and search 20250218423.

Tags: Alphabet Patent Review