Multimodal Dialogue Platform – SiAM-dp
Applications in the car have moved beyond steering aids and tuners a long time ago. Rich driver assistance and information systems increase comfort and deliver a substantial added value. To preserve an intuitive and safe operation in spite of the increased number and complexity of functions, we consider novel interaction methods that combine different modalities such as touch, speech, eye-gaze, and finger gestures. Control is t hereby not limited to systems in the own car, but can also be used to interact with the outside environment.
A major part of our research in this area is the creation of a multimodal dialogue platform that helps us investigate particular topics and also serves as a solid basis for creating dialogue applications for the car. The superior features of our dialogue platform SiAM-dp (Situation-Adaptive Multimodal Dialogue Platform) include
- Integrated coverage of multimodality concepts (fusion and fission)
- Several devices representing common modalities are supported out-of-the-box
- Intelligent dialogue system behavior through semantic interpretation of user input
- Situation adaptivity through dynamic behavior depending on user and context
- Personalized Driver Assistance
- Consideration of user resources (e.g. cognitive load, time)
- Offline evaluation of dialog runs gives early insights without a costly user study
- Multi-party support allows inclusion of passengers in the dialogue discourse
- Ability to dynamically connect to external devicesas output devices, e.g. electronic roadsigns and billboards.
Speech as an input and output modality is of particular relevance in the automotive context, since it represents a “hands-free, eyes-free” modality. Numerous studies show that speech (or the broader class of audio in the output case) is the least distracting of the common modalities with respect to the driving task, and therefore should be considered the default for many situations.
DFKI has longstanding experience in designing command-based and natural-language dialogue systems. The latter have recently been brought much more into the user’s awareness via smartphones, resulting in requests for similar interfaces in the vehicle. Through our collaboration with Nuance, IBM, and other speech technology partners in different projects, we can offer the best speech engines (speech recognition and speech synthesis) combined with advanced dialogue system technology.
Aside from the actual dialogue, we have often successfully employed paralinguistic information to personalize dialogue applications. Speech contains cues about gender, age, emotion, stress, etc. that can be extracted in approximation using pattern recognition methods.
Areas of further challenges are e.g.:
- Usage of statistical language models for modeling natural language interaction
- Situation-adaptive speech output, e.g. with respect to choice of words and voice parameters
- Cross-modal correction of speech recognition errors
- Context-sensitive treatment of speech recognition errors depending on the type of content
Micro gestures, i.e. gestures where the hands stay on the steering wheel and only individual fingers are moved, can be viewed in a similarly positive light as language when it comes to vehicle guidance. Thanks to progress in hardware over the last few years, recognition of micro gestures can be implemented into cars with minimal space requirements. The main challenge in expanding use and acceptance now consists of developing a cohesive overall concept for gesture control in vehicles. The Automotive IUI Group is currently examining studies that focus on usability, including the following questions
- Which vehicle functions are useful to control?
- Which gestures can be used intuitively, and how can the gesture set be expanded?
- How can gestures be multimodally combined with other modalities, e.g. gestures and language?
- Which physical interaction parameters need to be considered when used on the steering wheel (e.g. finger span)?
- How distracting is gesture interaction?