Monday, March 24, 2008

Clarissa: a HAL type computer for the ISS?

Much of the technology portrayed in the famous science fiction film "2001: A Space Odyssey" has failed to materialize by the year 2001. The HAL 9000 computer in the movie is no exception.

Artificial intelligence (AI) is a controversial topic, with a lot of disagreement as to what actually constitutes real intelligence. Many argue that a computer like HAL is way off in the future, while others would maintain that it's closer than we think.

Whichever opinion one holds, it's clear that the science of AI is moving forward anyway. Today we may even be seeing the beginnings of what could one day lead to a HAL like computer. In fact, perhaps we already have HAL's great great grandmother! In 2005, the International Space Station got a talking computer called Clarissa to help the astronauts by reading instruction manuals to them. Maggie McKee explains it to us in this article from New Scientist:

Space station gets HAL-like computer      [published June 2005]

A voice-operated computer assistant is set to be used in space for the first time on Monday – its operators hope it proves more reliable than "HAL", the treacherous speaking computer in the movie 2001: A Space Odyssey.

Called Clarissa, the program will initially talk astronauts on the International Space Station through tests of onboard water supplies. But its developers hope it will eventually be used for all computer-related work on the station.

Clarissa was designed with input from astronauts. They said it was difficult to perform the 12,000 procedures necessary to maintain the ISS and conduct scientific experiments while simultaneously reading through lengthy instruction manuals.

"Just try to analyze a water sample while scrolling through pages of a procedure manual displayed on a computer monitor while you and the computer both float in microgravity," says US astronaut Michael Fincke, who spent six months on the station in 2004.

Clarissa queries astronauts about the details of what they need to accomplish in a particular procedure, then reads through step-by-step instructions. Astronauts control the program using simple commands like "next" or more complicated phrases, such as "set challenge verify mode on steps three through fourteen".

Kim Farrell, Clarissa project manager, simulates on-orbit use of the system in the International Space Station mock-up at Ames Research Center.

"The idea was to have a system that would read steps to them under their control, so they could keep their hands and eyes on whatever task they were doing," says Beth Ann Hockey, a computer scientist who leads the project at NASA's Ames Research Center in Moffett Field, California, US.

That capability "will be like having another crew member aboard", says Fincke. (You can see Clarissa in action in a mp4 video hosted on this NASA page.) [...]

Clarissa uses an "open mic", and is capable of understanding multiple voices of astronauts, recognizing when astronauts are talking to each other and not to it, can deal with some ambient noise, and has a high voice recognition rate of around 94%, making it a very useful and professional tool. You can read the full article for more details, and there's more videos of Clarissa on NASA's web site:

Clarissa NASA page with photos and videos

Clarissa is cutting-edge technology, and is leading the way for future voice recognition and text-to-speech applications closer to home.

Beth Ann Hockey is the project leader of the Clarissa project.
The Clarissa software program also borrows her voice.

I find the Clarissa project interesting not only for what it does now, but for what it has the potential to do in the future. The following is an excerpt from an interview with the project's leader, Beth Ann Hockey, who gives us some insight into where this is going:

WHO'S WHO AT NASA: Beth Ann Hockey

[...] NTB: How will NASA utilize Clarissa?

Hockey: It could be used widely in any area of NASA that uses procedures like these; however, spoken-language and spoken-dialogue technologies are much more general than that and can be used in all sorts of other places. For example, we had some conversations about using it for ground-maintenance crews and for developing applications for use in mission control. Any time you want to have your hands and eyes free, it will be a win. There are many times that it could be beneficial simply because you’re moving around. If you had wireless technology, plus the spoken-dialogue technology, you could move around and still be accessing information that you need.

NTB: How did Xerox contribute to this project?

Hockey: In the realistic-experimental version that we have, we worked on some technology with Xerox because one of the big ideas behind this was to have your hands and eyes free; we did not want the user to have to push a button to indicate that speech recognition should start, which is the way that some systems are designed. We needed to have the speech recognition running constantly. The system has to decide whether the speech that it’s hearing is directed at it – is it a command it should understand – or is it something it should ignore.

We got together with Jean-Michel Renders from Xerox Research Centre Europe, an expert on kernel methods, and we believed that those methods would do a better job on this problem. We worked with Renders on using the kernel methods to make this open-microphone decision, and we cut the error rate in half.

NTB: What are possible commercial applications for Clarissa?

Hockey: I just gave a talk at the V-World Summit, which is held by Nuance Communications for their developers and customers. I was invited because they see what we’re doing as the next-generation of applications in their area. Nuance is the speech-recognition engine that we use. We build the language understanding in addition to that engine. Nuance is the first stage in what we use; it takes your acoustic signal and makes a good guess at the words that signal might have been. Nuance’s main business is supporting telephone-bank-type applications. For example, if you call an airline to check flight information or if you have an automated banking application that you interact with, those are probably built with Nuance. These are the types of applications that now are commercially common.

The application that we did for the astronauts is more complicated in a lot of ways when compared to those systems, which feel like a “menu only” that you’re talking to. Our system feels like you’re having a conversation with somebody who may not be the brightest person, but it feels more like a conversation. It’s natural, as there are more of these kinds of menu-type commercial applications out there and people get used to them, to move toward a more conversational technology. This is true especially as the technology keeps maturing.

Aside from the menu-type uses for this technology, the navigating of procedures applications could be natural for doing any kind of equipment maintenance (i.e., airlines). For example, tasks in which you’d have to have your hands doing something while you’re laying underneath a piece of equipment and it’s not convenient to stop and scroll through a computer screen or flip though papers. So there already are plenty of commercial applications; we’re just carrying it to the next level.

I’ve been talking mostly about this procedure navigator, while in fact the component technologies in that are even more widely applicable. In particular, the other project on which I am the lead is called Regulus. We’re developing an open-source tool kit to try and make the creation of spoken-dialogue interfaces more accessible to regular developers. Currently, you have to have someone with expertise in language technology to be able to do this well, but we’re trying to make it so that people can take this toolkit and make their own simple-to-moderate interfaces. It’s open source – people can simply download it. We also are working on a book that will include tutorial materials on how to use that system, which should be coming out next year. If people are interested in that, they should contact us. [...]

I did a post earlier about Dragon NaturallySpeaking 9 voice recognition software, which uses the Nuance speech engine Ms. Hockey speaks of. The Nuance engine is impressive, and judging from the consumer reviews, it's regarded as the best voice recognition speech engine available. A close runner up is the Microsoft speech engine that's bundled with Windows Vista, which consumers say is nearly as accurate as Nuance's latest version.

Have you noticed the Microsoft TV commercials lately, regarding software driven by voice commands? Voice recognition and Text-To-Speech (TTS) technologies promise to be two of the Next Big Things in computer technology.

Combine it with Artificial Intelligence, and we are on our way to a HAL like computer somewhere in our future.


Walker said...

Interesting. I had a friend, way back in 1989 or so who did his dissertation on computer-human conversation.

In that case, the dialogue took place all on a keyboard, of course, and involved a book. The computer would ask questions about the book and a student would type answers. Then the computer had to be able to analyze the written answer and followup. Seems to me the questions were yes/no. So the object was to see if they could write a program so good that a student would not be able to see whether he was speaking to a computer or a human. I think it took about five minutes before a student knew they were talking to a computer.

Similar thing happened to me last month on a computer support site. I was asking questions and suddenly I realized I was getting online support from a freaking computer. I just clicked out.

Chas said...

The last link in my post is to the on-line version of "Ultra Hal". It's a program you can talk to by typing, but you can also add a voice to it and have it talk to you. I downloaded it, and may do a post about it in the future.

Such "talkbots" are becoming increasingly common. The Ultra Hal link has a demo talkbot that is dedicated to talking only about Hamsters. It's a useful source of information about Hamsters, and such a talkbot could be programed to help with any number of topics. We are going to see them more and more, and they will get better and better.

But the Hamster talkbot isn't the only one on the site, there is a more general one that actually can learn from what you say and try to converse. The results aren't always convincing, but sometimes it's brilliant. Ultra Hal won an award last year for being the most "convincing" human-like talking computer.

It's a science that's in it's infancy, and I think it's going to grow into something really impressive.