Giving Voices to Multimodal Applications

机译：为多模式应用程序献声

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The use of speech interaction is important and useful in a wide range of applications. It is a natural way of interaction and it is easy to use by people in general. The development of speech enabled applications is a big challenge that increases if several languages are required, a common scenario, for example, in Europe. Tackling this challenge requires the proposal of methods and tools that foster easier deployment of speech features, harnessing developers with versatile means to include speech interaction in their applications. Besides, only a reduced variety of voices are available (sometimes only one per language) which raises problems regarding the fulfillment of user preferences and hinders a deeper exploration regarding voices' adequacy to specific applications and users. In this article, we present some of our contributions to these different issues: (a) our generic modality that encapsulates the technical details of using speech synthesis; (b) the process followed to create four new voices, including two young adult and two elderly voices; and (c) some initial results exploring user preferences regarding the created voices. The preliminary studies carried out targeted groups including both young and older-adults and addressed: (a) evaluation of the intrinsic properties of each voice; (b) observation of users while using speech enabled interfaces and elic-itation of qualitative impressions regarding the chosen voice and the impact of speech interaction on user satisfaction; and (c) ranking of voices according to preference. The collected results, albeit preliminary, yield some evidence of the positive impact speech interaction has on users, at different levels. Additionally, results show interesting differences among the voice preferences expressed by both age groups and genders.

机译：语音交互的使用在广泛的应用中非常重要和有用。这是一种自然的互动方式，一般人都易于使用。具有语音功能的应用程序的开发是一个巨大的挑战，如果需要多种语言（例如在欧洲，这是一种常见的情况），该挑战将会增加。为了应对这一挑战，需要提出一些方法和工具，以促进语音功能的更容易部署，并利用开发人员的多种手段在其应用程序中包括语音交互。此外，只能使用较少种类的声音（有时每种语言只有一种），这在满足用户喜好方面引起了问题，并阻碍了对声音是否适合特定应用程序和用户的更深入的探索。在本文中，我们介绍了我们对这些不同问题的一些贡献：（a）我们的通用形式，封装了使用语音合成的技术细节; （b）产生四个新声音的过程，包括两个年轻人声音和两个老年人声音; （c）一些初步结果探讨了用户对所创建声音的偏好。初步研究进行了有针对性的人群，包括年轻人和老年人，并讨论了：（a）评估每种声音的内在特性; （b）在使用启用语音的界面时观察用户，并激发关于所选语音的定性印象以及语音交互对用户满意度的影响; （c）根据喜好对声音进行排名。收集的结果尽管是初步的，但仍提供了语音交互在不同级别上对用户产生积极影响的一些证据。此外，结果显示，年龄组和性别所表达的语音偏好之间存在有趣的差异。

著录项

来源
《International conference on human-computer interaction》|2015年|273-283|共11页
会议地点
作者
Nuno Almeida; Antonio Teixeira; Ana Filipa Rosa; Daniela Braga; Joao Freitas; Miguel Sales Dias; Samuel Silva; Jairo Avelar; Cristiano Chesi; Nuno Saldanha;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Synthetic voices; Speech output; Multimodal interaction; Age effects;

机译：合成声音;语音输出;多式联运;年龄效应;

相似文献

外文文献
中文文献
专利

1. VoiceXML dialog system of the multimodal IP-Telephony - The application for voice ordering service [J] . Min-Jen Tsai Expert Systems with Application . 2006,第4期

机译：多模式IP电话的VoiceXML对话系统-语音订购服务的应用
2. Multimodal processing of emotional information in 9-month-old infants I: Emotional faces and voices [J] . Otte R. A., Donkers F. C. L., Braeken M. A. K. A., Brain and cognition . 2015,第apra期

机译：9个月大婴儿情绪信息的多模态处理I：情绪面孔和声音
3. From evolutionary roots to a broad spectrum of complex human emotions: Future research perspectives in the field of emotional vocal communication. Reply to comments on "Emotional voices in context: A neurobiological model of multimodal affective information processing" [J] . Brück C., Kreifelts B., Wildgruber D. Physics of life reviews . 2012,第1期

机译：从进化的根源到广泛的复杂人类情感：情感人声交流领域的未来研究视角。对“上下文中的情感声音：多模式情感信息处理的神经生物学模型”的评论答复
4. Giving Voices to Multimodal Applications [C] . Nuno Almeida, Antonio Teixeira, Ana Filipa Rosa, International conference on human-computer interaction . 2015

机译：发起多模式应用的声音
5. Students Developing Voices in New Learning Ecologies: Voice, Identity, Position and Function as a Framework to Support Multimodal Investigations of Learning Mathematics over Multiple Timescales [D] . El Chidiac, Fady 2018

机译：学生在新的学习生态学中开发声音：语音，身份，职位和功能，作为支持多级时间尺度学习数学研究的框架
6. Application of Poincare-Mapping of Voiced-Speech Segments for Emotion Sensing [O] . Krzysztof Ślot, Łukasz Bronakowski, Jaroslaw Cichosz, 2009

机译：Poincare映射语音段在情感感知中的应用
7. Learning Multimodality through Genre-Based Multimodal Texts Analysis: Listening to Students’ Voices [O] . Fuad Abdullah, Soni Tantan Tandiana, Yuyus Saputra 2020

机译：通过基于类型的多媒体文本分析学习多语言：听取学生的声音

Giving Voices to Multimodal Applications

摘要

著录项

相似文献

相关主题

期刊订阅