A Practical Guide to Voice User Interface Design
Ahmed Bouzid, Weiye Ma

#Voice
#voicebot
#AirPod
If you're a new or experienced designer of conversational voice first experiences, this handy reference provides actionable answers to key aspects of eyes-busy, hands-busy, voice-only user interfaces. Designed as a companion to books about conversational voice design, this guide includes important details regarding eyes-free, hands-free, voice-only interfaces delivered by Amazon Echo, Google Nest, and a variety of in-car experiences.
Authors Ahmed Bouzid and Weiye Ma provide far-field voice best practices and recommendations in a manner similar to The Elements of Style, the popular American English writing style guide. Like that book, The Elements of Voice First Style provides direct, succinct explanations that focus on the essence of each topic. You'll find answers quickly without having to spend time searching through other sources.
With this guide, you'll be able to:
Who Should Read This Book?
The target readers of this book are budding and practicing voicebot designers in the newly emerging technology space of far-field voice, as delivered by platforms such as Amazon Echo and Google Assistant, and hearable/speakable technologies, such as Apple’s AirPods. The book can also be useful for those who design IVR systems, but only to the extent that those systems are used eyes-free and hands-free.While this book is meant primarily to help voicebot designers think through and make sound design decisions, we have written the book explicitly to be highly readable and jargon-free so that it is also accessible to those colleagues who work with a designer: user experience (UX) researchers, product managers, developers, testers, marketers, and business development professionals.
Why We Wrote This Book
This book aims to provide direct answers to questions such as “How do I design an effective opening interaction with a voicebot?” or “What should I keep in mind as I design for failures?” or “What are some best practices for designing a conversational voice help system?” Answers to such questions can sometimes be found in other books, but the reader usually has to look hard for them, and that person may need to look up several books before they find the answers. This book pulls together all such answers into one text and focuses on answering those questions directly and succinctly.
However, this book does not pretend by any means to provide final, immovable, timelessly frozen answers. Our aim instead is to crystalize the crucial questions the designer should ask themselves when they undertake the work; and then provide our answers, drawing on our decades-long experience designing and deploying voicebots. For instance, the designer needs to carefully consider how a conversation opens and that the first few seconds of an interaction are crucial and can spell the success or failure of the interaction. Someone who has never designed a voicebot before may not even be aware of how crucial those opening moments are. It may also not occur to that designer that the first-time user and the frequent user must be engaged differently, or that prompts should be crafted in such a way that the user knows what to say when the prompt completes, or that there are time-proven techniques for writing effective failure-recovery prompts. Teaching the designer how to critically grapple with the many challenges that they will face when designing voicebots is our main goal, not prescribing fixed and nonnegotiable recipes.
This book has a second, and perhaps more ambitious aim, which is to argue, and advocate through its recommendations, for the following: the practice of designing effective voicebots needs to free itself from the notion that the closer a voicebot mimics a human (for instance, through the sound of the voicebot’s voice, the language it uses, the “persona” it assumes), the better will be the experience of the voicebot user. We believe this position—making a voicebot sound as human as possible—is as faulty as, say, stating that the way an adult speaks to a baby, or a child speaks to a dog, or a person speaks to someone who doesn’t speak their language, are imperfect styles that should be improved upon and should emulate two humans fully competent in the language they are using while speaking with each other. We will advocate for the outlines of a style of interacting with voicebots that will borrow many of the ways humans speak to each other, but we will deviate, and at times in significant ways, from human-to-human speak.
Dr. Ahmed Bouzid is Founder and CEO of Witlingo, a McLean, Virginia, based startup that builds products and solutions to help brands establish and grow their Voice and Social Audio presence. Prior to Witlingo, Dr. Bouzid was Head of Echo Smart Home Product at Amazon and VP of Product and Innovation at Genesys. Dr. Bouzid is an Ambassador at the Open Voice Network and Advisor for The Human Language Technology Group at Georgetown University. Dr. Bouzid holds 12 patents in the Speech Recognition and Natural Language Processing field and was recognized as a Speech Luminary by Speech Technology Magazine.
Dr. Weiye Ma obtained her PhD in Speech Processing and Recognition from Katholieke Universiteit Leuven (Belgium) and has been practicing professionally in the Speech Recognition, Speech Synthesis and Human Language Technology fields for more than 20 years. She has held several technical leadership roles in Unisys, Schneider Electric, Convergys, and is now Lead Speech Scientist at the MITRE Corporation.









