Aim higher

Speech recognition brings gaming to next level

The Eurostars project D-BOX has demonstrated the feasibility of integrating automatic speech recognition and text-to-speech technology into gaming applications.


Results gleaned from prototypes open the door not only to more interactive games that react to speech, but also to other apps that enable professionals, who do not share a language, to work collaboratively on a project. “Games are very good test cases for new technologies like this because it is easier to get people to play a game than fill out a survey,” explains D-BOX project coordinator Gregor Eigner from Mi'pu'mi Games, Austria. “Watching people play these games also provided us with feedback on how the technology is used in real world situations, and where it can be improved.”

The results have strengthened Eigner’s belief that ASR and TTS technologies have a future in gaming and that Mi'pu'mi Games is now very well positioned to tap into that potential. Since completion of the project, the firm has grown from 20 staff to around 30. But the experience has also underlined the challenges that lie ahead; large amounts of data were needed to develop even the project’s basic prototype and moving beyond simple commands such as ‘give’ or ‘go’ will prove challenging.

“While we are confident that this technology will be embedded in games in the future, there is still a huge amount of work that needs to be done,” agrees Eigner. “For example, last year we released a game with 25 000 lines of text; if we wanted to translate this into speech, this would involve considerable additional production efforts. When we find the right fit, we will definitely apply these technologies though, both for gaming and business applications. The technology is ready; we are just waiting for the right opportunity.”

Viable prototypes

The project brought together a number of SMEs to develop embedded multilingual conversational agents – dialogue boxes - for interactive games. For example, one SME partner that has expertise in telephone speech automation has created a database for referee reports to the German Football Association in the past.

“The referee would send his post-match report by phone to an automated database,” explains Eigner. “This is the sort of artificial intelligence (AI) technology that is being used in call centres.” Other partners created an interface to connect users and developed automatic speech recognition (ASR) and text-to-speech (TTS) modules. Mi'pu'mi Games’ role was to integrate all these technologies into testcase game prototypes.

The first prototype was a simple quiz game. Users would have to guess famous personalities the computer impersonated by asking simple questions. “From there we moved to our proof of concept prototype, which was a multiplayer game that involved two people trapped on a space station,” says Eigner. “The only way they could escape was by communicating with each other through the ship’s computer – this alludes to the way communication works on the space ship in the movie ‘2001: A Space Odyssey.’”

What made this prototype unique was that it was multilingual. One player could play in German while the other could play in French or English. “This shows that a group of people that don’t share a common language can communicate through an application like this,” says Eigner. “This is viable for games, but also has great potential for business and work applications.”

In terms of moving forward, Eigner believes that involvement in the D-BOX project has enhanced the firm’s credibility, not only as a gaming company but as an engineering company as well. “The project has enabled us to show that we can move beyond game development,” he says. “There are few companies in game development that have worked with both ASR and TTS technology.”