Discovering Novel Artificial Neural Network Architectures

Invented by Balabin; Ilya A., Geringer; Adam P., Blaize, Inc.
Artificial intelligence keeps changing fast. Every day, computers learn to understand sounds, recognize words, and even translate languages better than before. But how do these computers get so smart? The answer is in how their “brains”—called artificial neural networks (ANNs)—are built and improved. Today, we will explore a new way to build these smart brains for phones that listen, understand, and speak in different languages. This new method uses something called guided evolutionary growth. Let’s break it down so anyone can understand how this works, why it matters, and what makes it special.
Background and Market Context
Think about using your phone to talk to someone who speaks a different language. Maybe you ask a question in English, and your phone repeats it in Spanish for your friend. This is not magic—it’s technology. Your phone listens to your voice, changes it into text, translates that text into another language, and then speaks the new words out loud. All this happens in just a few seconds. Behind it all are complex systems called artificial neural networks (ANNs), which copy how the human brain works, learning to connect sounds and words together.
But building the best neural networks for these jobs is hard. Each network is made up of many small parts, like building blocks. These blocks, called neurons, are connected in layers. The way you put these blocks together—the architecture—determines how well the network understands and speaks. Most neural networks used today are designed by people. But there are so many ways to put these blocks together that most of the possible designs are never even tried. Some of those unknown designs could work much better than the ones we have now.
Right now, companies spend a lot of time and money testing different network designs to find the best ones. This is called network architecture search. But it takes huge computers and lots of power to search through all the possible ways to connect these blocks. That means only a tiny part of all possible network designs ever gets tested.
Imagine if there was a better way—a faster and smarter way—to explore all these possible designs and find new networks that work better for translating speech, recognizing voices, or doing other language tasks. That is what this new patent is about. It promises to help us discover new neural network designs quickly, using less power and time, so our phones and computers can get even smarter.
The market for voice assistants, real-time translators, and smart devices is huge and growing fast. People want to talk to their devices and have them understand and respond in any language. Schools, businesses, travelers, and even kids playing games all need better language tools. The company or team that finds the best way to build smarter neural networks will have a big advantage in this space.
That’s why finding new ways to create these networks is so important. The approach described in this patent is a big step forward. It uses what we already know, but then goes far beyond, searching for new ideas that people may have never thought of. It promises to make devices smarter, faster, and able to do things we can only imagine today.
Scientific Rationale and Prior Art
To really understand why this new invention matters, we need to look at how neural networks work and what has been tried before.
A neural network is a group of virtual “neurons” that take information (like sound), pass it through several layers, and then produce an answer (like text or a translation). Each neuron has a simple job, but together, they can learn to do very hard things. The way these neurons are connected—the network’s architecture—matters a lot. Even small changes can make a big difference in how well the network works.
Traditionally, people design these networks by hand, using their experience and some trial and error. They might start with a simple design, try it out, and then change it to see if it works better. There are a few common types of neural networks—like fully connected ones, convolutional ones for images, and recurrent ones for sequences like speech. Each type has strengths and weaknesses. But the world of possible designs is huge, and it’s impossible for people to test every combination by hand.
To help with this, researchers have created automated tools called Neural Architecture Search (NAS) systems. These systems use computers to try many different network designs, looking for the best ones. Some use random changes, others use more guided searches, and some even use ideas from evolution—like mutation and selection—to grow better networks over time. But even these systems have problems. They often need huge amounts of computer power and time. They can only search a small part of all possible designs before running out of resources.
Another idea that has been tried is to use “evolutionary algorithms.” These are inspired by how animals evolve in nature. They start with some simple networks, make small changes (mutations), and keep the ones that work best. Over many generations, the networks get better and better. But again, because there are so many possibilities, these methods still take a lot of time and power.
A big challenge has always been how to measure if a new network is really “new” and different from what already exists. If we just keep making small changes, we might end up with networks that are not much better than before. What if there was a way to not just find better designs, but also to make sure those designs are truly new—exploring parts of the network world that no one has tried before?
One more problem with old methods is that they often have to train each new network from scratch. Training a network means showing it lots of examples and adjusting it so it gets better at its task. This can take a long time, especially for big networks. If there was a way to only train the new or changed parts of a network, it could save lots of time and computer power.
The new invention described in this patent tries to solve these problems. It builds on past ideas but adds new tools for guiding the search, measuring how new a network really is, and making the whole process faster and smarter. It uses the idea of “fingerprints” for datasets and networks—a way to describe them simply, so we can quickly compare how similar or different they are. It guides the search to explore new areas, not just the ones we already know. And it finds ways to train networks quickly, only retraining the parts that have changed.
By putting all these ideas together, this new method promises to find new, better, and truly different networks much faster and with less computer power than before. That means smarter phones, better translators, and more powerful AI for everyone.
Invention Description and Key Innovations
Let’s look closely at how this new method works and what makes it special. The invention is a step-by-step process for discovering new neural network designs, especially for devices like phones that need to listen, understand, and speak in different languages. The goal is to find networks that do their job well, are truly new, and are found quickly without using too much computer power.
The process starts with a real-world task—like translating spoken words from one language to another. The phone listens to a person speaking in the first language using its microphone. The task is to turn that spoken audio into text, translate the text into a new language, and then speak it out loud in the second language. To do all this, the phone needs a neural network that is just right for this job.
Here’s how the new method finds the best network for the job:
First, it looks at what networks and datasets already exist. It picks several networks that are the same kind as the ones used for similar tasks (like audio-to-text or translation), and it also looks at the datasets—collections of sound or speech data—that have been used to train those networks.
Next, it gives each dataset a “fingerprint.” A fingerprint is a short code or set of numbers that describes what makes the dataset unique. It does the same for the new dataset the phone will use. By comparing these fingerprints, the system can quickly see which old dataset is most like the one it has now.
Once it finds the closest matching old dataset, it splits it into two parts: one for training and one for testing. This way, it can use one part to teach the network and the other part to see how well the network has learned.
Then, the system gives each existing network a fingerprint, too. This fingerprint includes all the details about how the network is built—the layers, the connections, and the special settings called hyperparameters.
The system then measures how similar or different each network is, using their fingerprints. It uses math to compare them, so it knows which networks are close to each other and which ones are very different.
After that, the process starts with the simplest possible network that can do the job. This network is very small, with just enough layers to work. This is like starting with a tiny plant before growing a big tree.
Now the real magic happens. The system creates many new networks by making small changes to the simple one—adding, removing, or changing layers and connections. Each new network is a “next-generation candidate.” The system gives each new network a fingerprint and checks how similar it is to the old ones. The goal is to make networks that are not just good at the task, but also different from the ones that came before.
Here’s where the evolutionary growth comes in. The system tests all the new networks by retraining them—but only retraining the parts that have changed. This saves a lot of time and power. It then checks how well each one does on the test data.
To decide which networks are best, the system uses a “fitness score.” This score looks at two things: how well the network does the task and how different it is from all the old networks. The best networks are both high-performing and new.
The networks with the best fitness scores survive and become the base for the next round of changes. The process repeats, each time creating new networks, testing them, and keeping the best ones. Over time, the networks get better and more unique, exploring new areas of possible designs that people might have never tried before.
When the system sees that the new networks are not improving much, it stops. The best network found becomes the one used for the real task on the phone.
What makes this method truly special is how it guides the search—not just looking for better networks, but making sure they are actually different from what’s already out there. It uses fingerprints and similarity math to map out the whole space of possible designs. It saves time by only retraining the parts that have changed. And it can be used not just for audio translation, but for any task where neural networks are needed—like computer vision, chatbots, and more.
The system can also work across many devices and sensors, not just phones. It can use cameras, microphones, or any sensor that collects data. It can even link several networks together, so one network’s output becomes another’s input. This means it can build very smart systems that do many jobs in a row, all using networks discovered by this smart evolutionary process.
By using this method, companies and developers can build smarter, faster, and more versatile AI for anything from cars that see the road to apps that talk in any language. The process opens up a whole new world of possibilities for building the brains of tomorrow’s smart devices.
Conclusion
Finding the best neural networks for tasks like audio translation has always been a big challenge. Old methods are slow and can’t explore all the possibilities. This new invention changes the game. By using fingerprints to compare datasets and networks, guiding the search to find truly new designs, and only retraining what’s needed, it brings us closer to creating smarter, faster, and more creative AI. This means better voice assistants, more accurate translators, and more helpful smart devices in our daily lives. As this technology spreads, we can expect our phones, cars, and computers to understand us better—no matter what language we speak.
Click here https://ppubs.uspto.gov/pubwebapp/ and search 20250217653.