What Is VoiceXML & How Does It Work?

Enterprise Networking Planet content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Moving your company’s infrastructure to an IP-based environment isn’t a simple chore, particularly if there’s any customer service involved. Unless you want to have a huge call center handling every aspect of customer support, you’ll need to automate processes to some extent.

VoiceXMLcan help that automation take place. Though not a new technology, VoiceXML is gaining steam as more and more corporations shift to an IP-based infrastructure. Basically, VoiceXML is an XML grammar (and not specifically a language) to create applications that make Web-based information accessible to users via voice and telephone.

“The numbers are with voice portals: There are more than a billion telephones in use around the world, while about 250 million PCs have Internet access.”

VoiceXML is not a proprietary technology belonging to a single corporation. Instead, it is managed by the VoiceXML Forum, whose members include the likes of AT&T, IBM, Lucent and Motorola. Because VoiceXML is positioned as a Web standard, it has been submitted to the World Wide Web Consortium (W3C) for approval: Version 1.0 was approved in March 2000 and version 2.0 was approved last month. (If you’re at all interested in VoiceXML, you can check out the specs here.)

Connecting the Dots

There have been many efforts to bridge the gap between computer and telephone-using consumers. The greatest distance is between WAP (Wireless Application Protocol), a text-based technology, and VoiceXML, which use voice technology. In some ways the two technologies are similar in terms of how they are implemented, but while WAP is purely text that’s appropriate for a PDA or cellular phone, VoiceXML builds on a familiar ground already used in the customer-service world: voice. There have been proprietary voice-recognition systems (or Interactive Voice Response systems, as they were once called) used for decades now, so customers are familiar with them. VoiceXML builds on this popularity with tools to bring Web content directly to the consumer.

Let’s say you’re calling Delta Airlines on its Song low-fare subsidiary, and you need to either make a reservation or change one. Using the 800/FlySong line, users can find flights, check schedules, compare fares and track baggage. Most airlines still requires assistance from a human customer-server rep at some point in these processes, but the Tellme Networkssetup delivers almost all relevant information via VoiceXML.

Similarly, we’re seeing voice portals — some national, some localized — rise as a business model. Most of these voice portals are powered by VoiceXML to some degree. Basically, these voice portals empower users to access Web data (like listings, stock quotes, phone numbers, and directions) via a natural-language interface. The numbers are with voice portals: There are more than a billion telephones in use around the world, while Internet access is limited to only 250 million PCs or so. And while PDAs may someday displace cell phones to an extent, it’s still hard to imagine a scenario more user-friendly for a consumer than being able to call in a question to a VoiceXML-based system.

How VoiceXML Works

VoiceXML is deceptively simple. Basically, when you build a VoiceXML application, you’re creating a tree that guides a caller through a series of questions mostly requiring a simple response. The application converts the responses (using voice-recognition technology) into text, leading to the next level of the tree.

“Because VoiceXML is XML designed for a specific purpose, you can either program it directly using an XML editor or combine it with a Java wrapper.”

Structurally, a VoiceXML application is fairly simple. When a session begins, a user is assigned a dialog state, and moving from topic to topic means navigating through menus. Feedback to the server is provided through forms (the same way a user enters data in the fields of an HTML form), with some data provided via voice and some via touch-tone (DTMF) key presses. More complex data input can be facilitated with ECMAScript scripting.

Because VoiceXML is XML designed for a specific purpose, you can either program it directly using an XML editor or combine it with a Java wrapper. For example: There are a number of development tools that generate VoiceXML applications, including Cafefrom BeVocal.

The Future of VoiceXML

The VoiceXML Forum continues to enhance VoiceXML. Work on version 3.0 has been going on for months. The next direction for VoiceXML applications to move is toward location-based services. TellMe currently offers localized information based on the Caller ID of the caller. In the future, data will be localized based on GPS or E9111 positioning data.

But the real challenges to VoiceXML won’t take place on the language-development side, but rather how society feels about providing information to a machine over the telephone. There are still many people who feel awkward about having a “conversation” with an anonymous voice over the telephone, and for those people a human voice will be the only interaction they want. In the end, VoiceXML applications won’t completely replace the call center — but they certainly can become an important part of customer service.

Get the Free Newsletter!

Subscribe to Daily Tech Insider for top news, trends, and analysis.

Latest Articles

Follow Us On Social Media

Explore More