Tens of millions of people with severe speech impediments need computerized voices to speak, but only a few voice options are on the market. This means a middle-aged man could have the same synthetic voice as a preteen girl.
Since her TED Talk went viral in February 2014, over 16,000 people have signed up to donate their voice to The Human Voice Bank Initiative, a sound collection system Patel is hastening to finish developing. Patel will have them speak a few hundred to a few thousand utterances over three to four hours and then take their enunciations and mix them with the vocal cord vibrations of the speech-impaired to create a unique voice. Her six-year-old daughter calls it "mixing colors to paint voices."
Speech technology research stagnated by the end of the 1980s. We were happy machines could talk, but less interested in what they sounded like. Patel believes custom voices are a fundamental right, not an aesthetic accessory: "With most disabilities if we give people something adequate to meet their needs, then we don't try to perfect it."
Patel, who is also an electrical and computer engineering professor at Northeastern University, says her goal is to reach a time when no two synthetic voices are the same.
Could you paint me a picture of how this technology works?
Speech synthesis takes text and turns it into speech. Take someone who uses an assistive communication device - there are lots of them on the market. Your phone is one, and the voice on it is a piece of software that can be swapped out. Our synthesizer turns their words into their own speech. So rather than use a generic voice, we now know how to make that voice sound like you.
There are two parts of speech. One is the vibration of your vocal cords to create sound. People with severe speech disorders have relatively preserved vocal cord vibrations but the ability to move their tongues or lips to form consonants and vowels has been impaired. We borrow that part from a healthy talker who is about the same size and age.
When you say severe speech impairment, is this something people are usually born with, or is there a wide range of recipients who can utilize the software?
People can be born with severe speech disorders - everything from cerebral palsy to Down syndrome to muscular dystrophy. There are also things you can acquire later in life, like you can have a stroke and be unable to speak, or get Lou Gehrig's disease or multiple sclerosis. It's a wide group of people that could be affected by this technology.
How many voices have you made so far? What has been the reaction from those receiving "their own voice" for the first time?
We've only made a handful of voices for people, but the reaction of the few people who are using the voices is what keeps us going. Many of them say they now use their devices more to communicate. That is absolutely the reason why we're doing this. We get feedback from not only the recipients, but also the caregivers and the parents. One mom said she was finally hearing her daughter for the first time: "She says stuff like, 'Mom this and Mom that. I never thought I'd get to hear 'mom' so many times." That's beautiful. That's unbelievable.
When do you foresee a large amount of people receiving custom synthetic voices? Why have only a few received them?
There is no company that does this. It's been a research project up until now and we have to build a company and then scale it out. We've been trying to perfect the technology. The next step is delivering it to people. My graduate students and I can't do that on our own. So we're building out the team and raising funds.
It seems we are truly at the brink of this research. Let's talk about some specific ways this could transform the way we experience technologies in the future. For example, as the use of robots increasingly expands into the health industry, I can imagine a person with Alzheimer's interacting with a caregiving robot with the voice of his or her grandchild.
The applications are endless. You're right in talking about this as being as the brink of something. Speech is a very natural interface. It is the next modality with which we're going to communicate with our technologies.
Because it's natural, speech doesn't require doing a second step. It's faster than anything else we can do.
There was a huge burst of speech technology research in the 1970s and 1980s. Once it got to a certain level, we were all okay with the fact that our talking machines sounded like machines. So this is the next evolution of that.
I would caution anyone from thinking that what we have today is something that can go in any award-winning movie. But the more time, money and science we put toward this, the closer we're going to get to that.
Do you foresee a profound shift in how we will conceive of the delineations between our bodies and technology? Or do you think we have a ways to go before people are comfortable with reconsidering where our body stops and technology starts?
That's really the Holy Grail. When a technology - particularly a prosthesis - can be seamlessly integrated with the individual. When we think about prosthetic limbs we make them more and more flesh like. That's the gold standard we're moving toward. Yet communication technologies are caveman-ish.
I don't know if you've ever had a loved one who has been in an ICU with a tube stuck down their throat. They can't talk, so what do they do? They point to things. If they can't spell or have cognitive issues then they are just... there. They can't express pain, love, or say their last wishes.
Being able to connect with other people is fundamental. Taking this to the extreme would be if I could just tell you my thoughts. But even if I told you my thoughts via some kind of brain interface, they would have to be communicated somehow. Are we going to be thinking things without talking out loud? Embodied in who we are is the fact that we can make speech.
Do you foresee your business rolling out any other applications in the near future?
Not immediately. The things that interest me most are the social good kind of applications. I don't have interested in a defense application, though am interested in the education aspects.
I imagine defense and entertainment would be the most lucrative avenues to move toward.
We think of entertainment and education as two different verticals, but if entertainment and education can play together you can enjoy the benefit of both - which is happening more with games for learning and health. Partnering education and entertainment would make for a powerful application space.
Is there anything you would like to add?
The really cool thing is the dialogue about the project itself. We now have around 15,000 voice donors - everyday people who want to give their voice to someone else. So now we're building the tools to collect those voices. We're calling this initiative The Human Voice Bank Initiative.
I am really excited about this whole thing. It has taken over my life!
Sign up for our biweekly newsletter featuring in-depth business innovation stories by correspondents around the globe, top domestic reporters and thought-provoking opinion columnists.