Chas' Compilation: voice recognition software

Showing posts with label voice recognition software. Show all posts

Wednesday, January 11, 2017

The Ubiquitous Alexa; is the Amazon AI assistant starting to be everywhere?

Kinda looks that way. The title of the article below refers to cars, but the article itself goes into much more. More about Alexa being incorporated into other appliances and, well, have a look:

Alexa will make your car smarter -- and vice versa
The integration into vehicles is yet another sign of how dependent we're becoming on AI.

[...] Within a span of just two years, Amazon's cloud-based voice service has spread far beyond the Echo speaker with which it first debuted. Alexa has gone from being an at-home helper to a personal assistant that can unlock your car, make a robot dance and even order groceries from your fridge.

At CES, both Ford and Volkswagen announced that their cars would integrate Alexa for weather updates, navigation and more. According to CJ Frost, principal architect solutions and automotive lead at Amazon, the car industry is moving into a mobility space. The idea isn't restricted to the ride anymore; it encompasses a journey that starts before you even get in the car. With the right skills built into the voice service, you can start a conversation with Alexa about the state of your car (is there enough fuel? is it locked? etc.) before you leave the house. It can also pull up your calendar, check traffic updates and confirm the meeting to make sure you're on track for the day.

Using a voice service in the car keeps your connection with the intelligent assistant intact. It's also a mode of communication that will be essential to autonomous cars of the near future. I caught up with Frost and John Scumniotales, general manager of Automotive Alexa service, at the Las Vegas convention center to trace the progression of the intelligent assistant from home speakers to cars on the road. [...]

The rest of the article is in an interview format, discussing where this is all going, and how and why, and what the future holds. Read the whole thing for embedded links, photos, video and more.

There have been lots of reviews on Youtube comparing Alexa with Google Home. People who use a lot of Google Services, claim the Google device is smarter and therefore better. But it's not that simple.

I have both devices. If you ask your question of Alexa in the format of: "Alexa, Wikipedia, [your question here]", the answer you get will often be as good or better than what Google can tell you. Alexa has been around longer, has wider integration, and more functions available. It can even add appointments to my Goggle Calendar, which Google Home says it cannot do yet!

Google Home does have some features it excels at, such as translating English words and phrases into foreign languages. If you own any Chromcast dongles, you can cast music and video to other devices, which is pretty cool. Presently it's biggest drawback is the lack of development of applications that work with it. However, it's POTENTIAL is very great, and a year or two from now we may see a great deal more functionality. It has the advantage of access to Google's considerable data base and resources. It could quickly catch up with Alexa, and perhaps surpass it. But that still remains to be seen.

It's not hard to make a video that makes one device look dumber than the other. But in truth the devices are very similar. Both can make mistakes, or fail at questions or functions. Sometimes one does better than the other. I actually like having both. It will be interesting to watch them both continue to evolve. To see if Google can close the gap created by Amazon's early head start. To see how the two products will differentiate themselves over time.

For the present, if you require a lot of integration with 3rd party apps and hardware, and if you are already using Amazon Prime and/or Amazon Music services, you might prefer Alexa. If you you are heavily into Google services, and/or Google Music or Youtube Red, you might prefer Google Home. Or if you are like me, an Amazon Prime/Music member and experimenting with Youtube Red and owner of chromcast devices, you may prefer both! Choice is good!

Saturday, December 12, 2015

Elon Musk, on OpenAI: “if you’re going to summon anything, make sure it’s good.”

I agree. Will these guys lead the way?

Elon Musk and Other Tech Titans Create Company to Develop Artificial Intelligence

[...] The group’s backers have committed “significant” amounts of money to funding the project, Musk said in an interview. “Think of it as at least a billion.”

In recent years the field of artificial intelligence has shifted from being an obscure, dead-end backwater of computer science to one of the defining technologies of the time. Faster computers, the availability of large data sets, and corporate sponsorship have developed the technology to a point where it powers Google’s web search systems, helps Facebook Inc. understand pictures, lets Tesla’s cars drive themselves autonomously on highways, and allowed IBM to beat expert humans at the game show “Jeopardy!”

That development has caused as much trepidation as it has optimism. Musk, in autumn 2014, described the development of AI as being like “summoning the demon.” With OpenAI, Musk said the idea is: “if you’re going to summon anything, make sure it’s good.”

Brighter Future

“The goal of OpenAI is really somewhat straightforward, it’s what set of actions can we take that increase the probability of the future being better,” Musk said. “We certainly don’t want to have any negative surprises on this front.” [...]

I did a post about that comment of his a while back:

The evolution of AI (Artificial Intelligence)

Nice to see that those who were making the warnings, are also actively working to steer the development in positive directions and trying to avoid unforeseen consequences.

I still think real AI is a long way off. But it isn't too soon to start looking ahead, to anticipate and remedy problems before they even occur.

Wednesday, December 02, 2015

Oh no, what have I done?

In a weak moment, whilst perusing the Black Friday offerings on Amazon.com, I ordered one:

Amazon Echo

Amazon Echo is designed around your voice. It's hands-free and always on. With seven microphones and beam-forming technology, Echo can hear you from across the room—even while music is playing. Echo is also an expertly tuned speaker that can fill any room with immersive sound.

Echo connects to Alexa, a cloud-based voice service, to provide information, answer questions, play music, read the news, check sports scores or the weather, and more—instantly. All you have to do is ask. Echo begins working as soon as it detects the wake word. You can pick Alexa or Amazon as your wake word. [...]

The features listed with the photo are only a few of the key features. Follow the link for more info, embedded videos, reviews, FAQ and more.

It, "Alexa", arrives tomorrow. I wonder if it will be anything like HAL from the movie 2001: A Space Odyssey? That would be kinda cool, I guess. As long as she isn't the Beta version that murders you while you sleep.

UPDATE 12-08-15: So far, so good. It does everything they said it would. Only complaint, it can't attach to external speakers (but I knew that before I bought it.) It was very easy to set up, it's very easy to use. The voice recognition is really excellent. I can play radio stations from all over the world. When I want info about a song or music, I can ask Alexa, and she will tell me.

There are more features available if I sign up for Amazon Prime ($100 per year, which works out to $8.50 a month). I'm thinking about it.

Monday, January 12, 2015

Skype, with a speech translator?

Supposedly. This was announced last month:

Skype Will Begin Translating Your Speech Today

¿Cómo estás?

Voice over IP communication is entering a new era, one that will hopefully help break down language barriers. Or so that's the plan. Using innovations from Microsoft Research, the first phase of the Skype Translator preview program is kicking off today with two spoken languages -- Spanish and English. It will also feature over 40 instant messaging languages for Skype customers who have signed up via the Skype Translator sign-up page and are using Windows 8.1.

It also works on preview copies of Windows 10. What it does is translate voice input from someone speaking English or Spanish into text or voice. The technology relies on machine learning, so the more it gets used, the better it will be at translating audio and text.

"This is just the beginning of a journey that will transform the way we communicate with people around the world. Our long-term goal for speech translation is to translate as many languages as possible on as many platforms as possible and deliver the best Skype Translator experience on each individual platform for our more than 300 million connected users," Skype stated in a blog post.

Translations occur in "near real-time," Microsoft says. In addition, there's an on-screen transcript of your call. Given the many nuances of various languages and the pace at which communication changes, this is a pretty remarkable feat that Microsoft's attempting to pull off. There's ton of upside as well, from the business world to use in classrooms.

If you want to test it out yourself -- and Microsoft hopes you do, as it's looking for feedback at this early stage -- you can register for the program by going here.

Follow the link to the original article for embedded links, and a video.

See how it works here:

Skype Translator is the most futuristic thing I’ve ever used

We have become blasé about technology.

The modern smartphone, for example, is in so many ways a remarkable feat of engineering: computing power that not so long ago would have cost millions of dollars and filled entire rooms is now available to fit in your hand for a few hundred bucks. But smartphones are so widespread and normal that they no longer have the power to astonish us. Of course they're tremendously powerful pocket computers. So what?

This phenomenon is perhaps even more acute for those of us who work in the field in some capacity. A steady stream of new gadgets and gizmos passes across our desks, we get briefed and pitched all manner of new "cutting edge" pieces of hardware and software, and they all start to seem a little bit the same and a little bit boring.

Even news that really might be the start of something remarkable, such as HP's plans to launch a computer using memristors for both longterm and working memory and silicon photonics interconnects, is viewed with a kind of weary cynicism. Yes, it might usher in a new generation of revolutionary products. But it probably won't.

But this week I've been using the preview version of Microsoft's Skype Translator. And it's breathtaking. It's like science fiction has come to life.

The experience wasn't always easy; this is preview software, and as luck would have it, my initial attempts to use it to talk to a colleague failed due to hitherto undiscovered bugs, so in the end, I had to talk to a Microsoft-supplied consultant living in Barranquilla, Colombia. But when we got the issues ironed out and made the thing work, it was magical. This thing really works. [...]

Follow the link for more, and enlargeable photos that shows what it looks like as it's working.

Friday, September 14, 2012

The Next Generation of Computer Chips

Intel's Haswell chips are engineered to cut power use

Intel has released early details of its Haswell computer chips, due for release in the middle of next year.

One version of the processors will run at 10 watts, about half as much as its current Ivy Bridge design.

It said the improvement would mean devices could become thinner, faster and offer extended battery life.

In addition it said the chips were designed to better support "perceptual" tasks such as voice recognition, facial analysis and depth tracking.

[...]

Another innovation on the new chips is a more powerful GPU (graphics processing unit). This is designed to handle tasks in which a large number of calculations can be carried out simultaneously, rather than one-at-a-time.

Speech and face recognition are highly parallelisable tasks and will thus benefit from this improvement.

Intel is working with speech-recognition company Nuance to create a software kit to help developers best unlock the chips' potential.

In addition it suggests Haswell-based computers will also be better suited to tracking objects placed close to their camera sensors allowing further development of gesture controls and augmented reality. [...]

More on the voice aspect:

Intel brings voice search to ultrabooks

Intel is going to integrate a Google Voice-like technology into its future ultrabooks.

By partnering with voice specialist Nuance, Intel will let ultrabook buyers use speech to control their laptop, Dadi Perlmutter, general manager of the Intel Architecture Group, said in a keynote speech at the Intel Developer Forum in San Francisco on Tuesday.

In an onstage demonstration, attendees saw an Intel developer instruct a Dell XPS ultrabook to search the web, look up a product on Amazon, tweet a link to it, and then play some music. All of this was done with voice control.

The software "is running native on the platform. This is not a cloud service, this requires the high-performing CPU and the capabilities inside", Perlmutter said. Intel has worked with Nuance to tune the application for its processors to maximise performance, he said.

The software pairs Nuance's Dragon Assistant technology with Intel-based ultrabooks and should be available as a beta in late 2012 and as a full product in the first quarter of 2013.

It is reminiscent of Google Voice, which lets Android users search the web and control their phone by talking to it. The main difference is that Intel's software is initiated by the user saying 'Hello Dragon' to their computer, while Google typically requires the user to touch the screen.

Nuance's flagship product for PCs is Dragon Naturally Speaking. Furthermore, its technology sits at the heart of Apple's Siri voice search technology. [...]

Tuesday, September 13, 2011

"Watson" the game-playing talking super computer is getting a job at your doctors office

But he won't replace your doctor. At least not right away:

IBM's 'Jeopardy' computer lands health care job

NEW YORK (CNNMoney) -- IBM's Watson computer thrilled "Jeopardy" audiences in February by vanquishing two human champs in a three-day match. It's an impressive resume, and now Watson has landed a plum job.

IBM is partnering with WellPoint, a large health insurance plan provider with around 34 million subscribers, to bring Watson technology to the health care sector, the companies said Monday.

[...]

The goal is for Watson to help medical professionals diagnose and sort out treatment options for complicated health issues. Think of the system as an electronic Dr. House.

"Imagine having the ability to take in all the information around a patient's medical care -- symptoms, findings, patient interviews and diagnostic studies," Dr. Sam Nussbaum, WellPoint's (WLP, Fortune 500) chief medical officer, said in a prepared statement.

"Then, imagine using Watson analytic capabilities to consider all of the prior cases, the state-of-the-art clinical knowledge in the medical literature and clinical best practices to help a physician advance a diagnosis and guide a course of treatment," he added.

WellPoint plans to begin deploying Watson technology in small clinical pilot tests in early 2012.

[...]

IBM said early on that health care is a field where it anticipated commercialization opportunities for Watson. Other markets IBM is eying include online self-service help desks, tourist information centers and customer hotlines. [...]

So it's going to be used as a tool, like an interactive voice-activated database. The clinical pilot tests should be interesting. If it doesn't work out, perhaps Watson can get a job as a Radio DJ. "Denise" had better watch out!

I've posted about Watson previously:

"Watson" won. But did it really?

Monday, February 08, 2010

Will literacy become a thing of the past, to be replaced by a new VIVOlutionary "oral" culture?

I came across this book, which seems to predict the end of written language as being not only inevitable, but also as a good thing! Listening replaces reading:

VIVO [Voice-In/Voice-Out]: The Coming Age of Talking Computers

Review
"A welcome addition to the discussion about voice-recognition technology and the social implications of talking computers." -- Edward Cornish, President, World Future Society, Bethesda, Maryland

"Audacious and mind-stretching. Crossman sees our reliance on the printed word coming rapidly to an honorable end." -- Arthur B. Shostak, Drexel University, Philadelphia, Pennsylvania

"If you are an educator, you need to read this book." -- Les Gottesman, Golden Gate University, San Francisco, California

Product Description
A positive look at how talking computers, VIVOs, will make text/written langauge obsolete, replace all writing and reading with speech and graphics, democratize information flow worldwide, and recreate an oral culture by 2050.

Text is an ancient technology for storing and retrieving information; VIVOs will do the same job more quickly, efficiently, and universally. Among VIVO's potential benefits: 80% of the world's people are functionally nonliterate; they will be able to use VIVOs to access all information without having to learn to read and write.

VIVO's instantaneous translation function will let people speak with other people around the world using their own native languages. People whose disabilities prevent them from reading and/or writing will be able to access all information.

Four "engines" are driving us irreversibly into the VIVO Age and oral culture: human evolution, technological breakthroughs, young people's rejection of text, and people's demand for text-less, universal access to information.

Future generations, using eight key VIVOlutionary learning skills, will radically change education, human relations, politics, the arts, business, our relation to the environment, and even human consciousness itself. Worldwide access to VIVO technology looms as a key human rights issue of the 21st century.

Clearly the trend exists. I've seen in my lifetime, people reading less and less; getting their information from TV, radio, videos and movies, more than reading. But will it go so far as to actually make text and reading obsolete?

Imagine if there is a blackout or prolonged power outage. Nobody can read, because they get all their information from electronic devices that talk to them. Suddenly, everyone is a dumb-ass moron, until the power comes on again? Are we just becoming too dependent on electronic devices? If power goes out for an extended time, due to either natural or man-made causes, an illiterate population with no books would be in double trouble.

Oh Brave New World, with such (illiterate) people in it...

Friday, June 05, 2009

Is it Artificial Intelligence? Or Fake People? Or...?

Meet the future. Meet "Milo", a new innovation for our Brave New World:

It's an interesting technology. But in the end, it's a fantasy. It's not a real boy you are talking to, it's a simulation of a real boy. Simulated intelligence. The lights are on, but nobody's home.

As this kind of technology is pursued, you have to wonder, what the unintended consequences might be for real people.

Sunday, February 15, 2009

Computer Voices and the song "Daisy Bell"

I've been reading about artificial intelligence and computer voices lately, and I came across these videos on Youtube. The first video involves a clip from Arthur C. Clarke's "2001: A Space Odyssey". Remember the scene where Hal is deactivated? As his memory cards are being pulled, Hal's personality regresses to his "childhood" days in the computer Lab in Urbana Illinois. He sings a song he leaned there. The song was "Daisy Bell".

It seems that song was used for an historical reason:

The video (1 minute and 39 seconds) claims that a computer in the 1950's was the first computer ever to sing a song. The song was "Daisy Bell".

But another video gets more specific. It says that the first computer to sing a song was in 1961. It was an IBM 7094. The video (1 minute and 51 seconds) gives a sample of the song, with computerized musical accompaniment too, and also gives the names of the programmers who created it:

Another video (with no embedded option) shows a photo of the computer (?) with and audio track of it's voice and singing repertoire:

http://www.youtube.com/watch?v=IlBmbt8IVv4

Listening to all this reminds me of my Commodore 64 days. Does anyone remember "The Write Stuff", a Commodore 64 word processor released in 1987 by Busy Bee Software? It could read documents with a computer voice that was very similar to the one in these videos. My Busy Bee software could even sing "Twinkle Twinkle Little Star". It was both funny and painful to listen to.

Nowadays, computer voice technology is so much more advanced. There are a growing number of realistic sounding computer voices, and an abundance of free or inexpensive TTS (Text-T0-Speech) programs to go with them. And Hal-like computer programs to go with those voices are fast approaching, too.

Oh Brave New World, with such people (and artificial-people) in it!

Related Links:

Artificial voice synthesis, 1939 to the present

Ultra HAL, your personal computer assistant

The history and lyrics of the song "Daisy Bell"

Sunday, November 30, 2008

Ultra HAL, your personal computer assistant

HAL the talking computer is here, just like in the famous Sci-Fi movie "2001: a Space Odyssey"! Well, ok, not REALLY, but at least this HAL won't lock you outside or kill you while you sleep! Ultra Hal is a fun HAL, and it's here NOW:

Ultra Hal Assistant 6.1 - NEW!

[...] Ultra Hal Assistant is your digital secretary and companion. He (or she depending on your character preference) can remember and remind you of appointments. He can keep an address book. He can keep a phone book, and even dial phone numbers for you! Hal can also run programs and recent documents on command. Hal can help you browse the Internet. He will offer you help with most of your Windows programs. Hal does all of this from natural language -- simply tell him or ask him something in plain English!

Hal has huge conversational database and can chat about anything at all. Hal will learn from every single sentence that you tell him and over time Hal will learn to like the same things you do, and to talk about topics you like to talk about. Ultra Hal Assistant even has built in speech recognition so that you can speak to Hal out loud instead of typing. Ultra Hal utilizes an advanced realtime 3D character engine from Haptek that delivers 3-D artificial human characters so convincing and engaging you could swear they were real. You can download the free trial version from this site. Find out whats new in version 6.1 [...]

Now I have played with the Ultra Hal Assistant a bit... calling it intelligent may be a bit of a stretch... though that may depend on what your definition of Artificial Intelligence is. Hal really does have the capacity to "learn" things from you, and with time and training, it does give seemingly intelligent, even surprisingly clever, replies and comments.

How useful it will actually be remains to be seen. I'm not much of a chit-chat person when talking with real people; chit-chatting with a computerized Artificial Intelligence can seem like rather a waste of time, once the novelty wears off. But giving it commands to look for stuff on Google for you, look up the weather and read it to you, dial phone calls and such - are things some people may find useful. And as Hal learns from you, the chit-chat gets more interesting.

You can use the free trial version, but to buy it costs only $29.95. It can be used as the "brain" for other 3rd party software products. I have to say that, what's probably more impressive than what Ultra Hal does, is the potential for what it can become. It's a technology in it's childhood, about to grow up.

Remember that Arnold Schwarzenegger movie, The 6th Day? There is one scene where the Arnold character is at the home of a divorced friend. The friend has a female talk-bot on the screen, and they flirt with each other. The Ulta Hal program can work with a variety of 3rd party "characters" that function in much the same way. Check out this on-line, interactive, talking People Putty Demo by Haptek. There are even various plug-ins for Ultra Hal, including alternate voices, brains, speech engines, and animated faces and characters, even flirty sex-bot characters, precursors to the one in the movie. The future is closer than you think.

There are also third-party companies that make realistic "voices" that are compatible with Ultra Hal. One such company is called Cepstral, that makes really good sounding computer voices. You can download free samples to try out, they come with an easy to use text reader software program: www.cepstral.com.

The combination of these products will provide you with a near-HAL 2001-a-space-oddity type experience. It's pretty kewl... just don't plug it into your cryogenic sleep chamber when you got to bed at night. ;)

A HAL type project was implemented by NASA on the International Space Station in 2005:

Clarissa: a HAL type computer for the ISS?

Clarissa is capable of understanding multiple voices of astronauts, recognizing when astronauts are talking to each other and not to it, and can deal with ambient noises, etc. While efficient in those areas, she does not try to make conversation like HAL does.

I'd like to see Clarissa's voice comprehension used in Ultra Hal. Probably the best voice recognition software available to most of us on planet Earth would be Dragon Naturally Speaking 10, which is I believe, what Clarissa uses. The reviews I've read seem to indicate it's the most accurate commercial voice recognition software commonly available. DNS doesn't make idle chit chat with you like Ultra Hal, but it can be used for speaking commands to your computer, as well as taking dictation.

Older version's of Dragon's speech engine have been compatible with Ultra Hal, but it's unclear to me whether or not the latest speech engine in Dragon Naturally Speaking can also be used with it. I've read conflicting reports, some say yes, some say there is restrictions that prevent it.

I've found that Ultra HAL had some trouble understanding my voice, because it would hear it's own voice speaking in reply and think it was me talking. I eventually gave up trying to teach HAL to understand my speech, meaning I had to use the keyboard to communicate with HAL. It would be nice if Clarissa's technology could be added to HAL's, making it more of a hands-free experience. As it was, having to keyboard my responses to HAL was not very productive for me, because I could not do other things while I was communicating with HAL. I would love to chit-chat with HAL and teach it things, IF my hands were free so I could get on with other things I need to do. If I could verify that the Dragon 10 engine would work with HAL, I'd buy it. Being able to chat with HAL hands free, would be worth it.

HAL does have a "Brain" settings panel, where you can set it's learning parameters. You can actually deliberately teach it things, and see them reflected back to you in it's responses. Sometimes the replies it can give you really do seem intelligent and even witty. The Ultra HAL program won the 2007 Loebner Prize for artificial intelligence most able to pass for a human.

An on-line talkbot called Elbot won the prize in 2008. It's an excellent on-line talk-bot, but not available for download onto your computer like HAL is. HAL is the only one I know of that you can download and use on your PC.

If you would like to try to converse with an on-line version of HAL, you can do so here:

Chat with Web HAL

I think all of these things, artificial brains, faces and voices, are combining to become one of those Next Big Things. As these technologies improve, I think it's going to take off and be very BIG. If you want a taste of things to come, download the free trial version of Ultra Hal and play with it. I've had great fun with it.

Zabaware has a page listing bloggers who have reviewed Ultra Hal:

Read what bloggers are saying about Ultra Hal,
an artificial intelligence chatter bot

UPDATE 12-11-08: I received an email dated December 2nd, from Zabaware. Here is an excerpt:

[...] Zabaware is hard at work developing the next version of Ultra Hal Assistant, which will be available in 2009. The new version will be a free upgrade to all Ultra Hal Assistant 6.x users and will include brain improvements and a new 3D graphics engine. Also in 2009 Zabaware will introduce cutting edge speech recognition and microphone technology, which will let you talk with Hal naturally without a clumsy headset microphone. Be sure to check www.zabaware.com in early 2009 for further news.

Sincerely,
Robert Medeksza

Mr. Medeksza is the creator of Ultra Hal. This is excellent news! I'll be looking forward to the upgrade and the improved speech recognition.

UPDATE 01-31-09:
There is a new Release Candidate, Ultra Hal 6.2, available for download on the Hal Forum. I've tried it out, and it's really good. Hal no longer hears his own voice when he speaks, and the overall voice recognition seems greatly improved. There are other improvements as well, I will post more about it when the final release out. For those who can't wait, check it out.

UPDATE 02-09-09:
The new release, Hal 6.2, is officially out. Free trial download available, free upgrade for current owners. I bought the ViaVoice plugin, and so far, it's very nice. I will do a post about this new version after I have used it for a while.

Get yours' here: Ultra Hal Assistant 6.2 - NEW Version!

Monday, March 24, 2008

Clarissa: a HAL type computer for the ISS?

Much of the technology portrayed in the famous science fiction film "2001: A Space Odyssey" has failed to materialize by the year 2001. The HAL 9000 computer in the movie is no exception.

Artificial intelligence (AI) is a controversial topic, with a lot of disagreement as to what actually constitutes real intelligence. Many argue that a computer like HAL is way off in the future, while others would maintain that it's closer than we think.

Whichever opinion one holds, it's clear that the science of AI is moving forward anyway. Today we may even be seeing the beginnings of what could one day lead to a HAL like computer. In fact, perhaps we already have HAL's great great grandmother! In 2005, the International Space Station got a talking computer called Clarissa to help the astronauts by reading instruction manuals to them. Maggie McKee explains it to us in this article from New Scientist:

Space station gets HAL-like computer [published June 2005]

A voice-operated computer assistant is set to be used in space for the first time on Monday – its operators hope it proves more reliable than "HAL", the treacherous speaking computer in the movie 2001: A Space Odyssey.

Called Clarissa, the program will initially talk astronauts on the International Space Station through tests of onboard water supplies. But its developers hope it will eventually be used for all computer-related work on the station.

Clarissa was designed with input from astronauts. They said it was difficult to perform the 12,000 procedures necessary to maintain the ISS and conduct scientific experiments while simultaneously reading through lengthy instruction manuals.

"Just try to analyze a water sample while scrolling through pages of a procedure manual displayed on a computer monitor while you and the computer both float in microgravity," says US astronaut Michael Fincke, who spent six months on the station in 2004.

Clarissa queries astronauts about the details of what they need to accomplish in a particular procedure, then reads through step-by-step instructions. Astronauts control the program using simple commands like "next" or more complicated phrases, such as "set challenge verify mode on steps three through fourteen".

Kim Farrell, Clarissa project manager, simulates on-orbit use of the system in the International Space Station mock-up at Ames Research Center.

"The idea was to have a system that would read steps to them under their control, so they could keep their hands and eyes on whatever task they were doing," says Beth Ann Hockey, a computer scientist who leads the project at NASA's Ames Research Center in Moffett Field, California, US.

That capability "will be like having another crew member aboard", says Fincke. (You can see Clarissa in action in a mp4 video hosted on this NASA page.) [...]

Clarissa uses an "open mic", and is capable of understanding multiple voices of astronauts, recognizing when astronauts are talking to each other and not to it, can deal with some ambient noise, and has a high voice recognition rate of around 94%, making it a very useful and professional tool. You can read the full article for more details, and there's more videos of Clarissa on NASA's web site:

Clarissa NASA page with photos and videos

Clarissa is cutting-edge technology, and is leading the way for future voice recognition and text-to-speech applications closer to home.

Beth Ann Hockey is the project leader of the Clarissa project.
The Clarissa software program also borrows her voice.

I find the Clarissa project interesting not only for what it does now, but for what it has the potential to do in the future. The following is an excerpt from an interview with the project's leader, Beth Ann Hockey, who gives us some insight into where this is going:

WHO'S WHO AT NASA: Beth Ann Hockey

[...] NTB: How will NASA utilize Clarissa?

Hockey: It could be used widely in any area of NASA that uses procedures like these; however, spoken-language and spoken-dialogue technologies are much more general than that and can be used in all sorts of other places. For example, we had some conversations about using it for ground-maintenance crews and for developing applications for use in mission control. Any time you want to have your hands and eyes free, it will be a win. There are many times that it could be beneficial simply because you’re moving around. If you had wireless technology, plus the spoken-dialogue technology, you could move around and still be accessing information that you need.

NTB: How did Xerox contribute to this project?

Hockey: In the realistic-experimental version that we have, we worked on some technology with Xerox because one of the big ideas behind this was to have your hands and eyes free; we did not want the user to have to push a button to indicate that speech recognition should start, which is the way that some systems are designed. We needed to have the speech recognition running constantly. The system has to decide whether the speech that it’s hearing is directed at it – is it a command it should understand – or is it something it should ignore.

We got together with Jean-Michel Renders from Xerox Research Centre Europe, an expert on kernel methods, and we believed that those methods would do a better job on this problem. We worked with Renders on using the kernel methods to make this open-microphone decision, and we cut the error rate in half.

NTB: What are possible commercial applications for Clarissa?

Hockey: I just gave a talk at the V-World Summit, which is held by Nuance Communications for their developers and customers. I was invited because they see what we’re doing as the next-generation of applications in their area. Nuance is the speech-recognition engine that we use. We build the language understanding in addition to that engine. Nuance is the first stage in what we use; it takes your acoustic signal and makes a good guess at the words that signal might have been. Nuance’s main business is supporting telephone-bank-type applications. For example, if you call an airline to check flight information or if you have an automated banking application that you interact with, those are probably built with Nuance. These are the types of applications that now are commercially common.

The application that we did for the astronauts is more complicated in a lot of ways when compared to those systems, which feel like a “menu only” that you’re talking to. Our system feels like you’re having a conversation with somebody who may not be the brightest person, but it feels more like a conversation. It’s natural, as there are more of these kinds of menu-type commercial applications out there and people get used to them, to move toward a more conversational technology. This is true especially as the technology keeps maturing.

Aside from the menu-type uses for this technology, the navigating of procedures applications could be natural for doing any kind of equipment maintenance (i.e., airlines). For example, tasks in which you’d have to have your hands doing something while you’re laying underneath a piece of equipment and it’s not convenient to stop and scroll through a computer screen or flip though papers. So there already are plenty of commercial applications; we’re just carrying it to the next level.

I’ve been talking mostly about this procedure navigator, while in fact the component technologies in that are even more widely applicable. In particular, the other project on which I am the lead is called Regulus. We’re developing an open-source tool kit to try and make the creation of spoken-dialogue interfaces more accessible to regular developers. Currently, you have to have someone with expertise in language technology to be able to do this well, but we’re trying to make it so that people can take this toolkit and make their own simple-to-moderate interfaces. It’s open source – people can simply download it. We also are working on a book that will include tutorial materials on how to use that system, which should be coming out next year. If people are interested in that, they should contact us. [...]

I did a post earlier about Dragon NaturallySpeaking 9 voice recognition software, which uses the Nuance speech engine Ms. Hockey speaks of. The Nuance engine is impressive, and judging from the consumer reviews, it's regarded as the best voice recognition speech engine available. A close runner up is the Microsoft speech engine that's bundled with Windows Vista, which consumers say is nearly as accurate as Nuance's latest version.

Have you noticed the Microsoft TV commercials lately, regarding software driven by voice commands? Voice recognition and Text-To-Speech (TTS) technologies promise to be two of the Next Big Things in computer technology.

Combine it with Artificial Intelligence, and we are on our way to a HAL like computer somewhere in our future.

Monday, February 18, 2008

Voice Recognition Software; is it there yet?

Ten years ago I tried using Dragon Naturaly Speaking and Via Voice software, and was not greatly impressed. You had to do all this work training the program, and there would still be errors you would have to correct. It seemed easier just to type.

But many years later, the accuracy of Dragon Naturally Speaking is supposed to be much better. Could it really be as good as it looks in this demo?

Dragon NaturallySpeaking 9 Demo

I've been told by one person that he saves up to two hours a day since giving up typing and using voice recognition software instead. I must say, it sounds pretty good.