Conversation With Things by Diana Deibel and Rebecca Evanhoe

Conversation With Things is a fascinating journey into the world of conversational design. Authors Diana Deibel and Rebecca Evanhoe have gone to great lengths to produce the introduction they wish they'd had when they started. Two chapters, 'Talking like a Person' and 'Complex Conversations', truly demonstrate an understanding of the subject and convey the feeling that they are grounded in many years of practice and analysis.

Although the book was written before generative AI burst into the world, it remains relevant. The chapters on defining intents and documenting conversational pathways could easily be seen now as methods of evaluation and testing for human alignment with LLM-based conversational tools. I hope the authors will one day consider a second edition that encompasses practices in the age of generative AI.

ISBN: 978-1933820-26-2
Published:   April 2021
https://rosenfeldmedia.com/books/conversations-with-things/

Reading notes:

Taking Like a Person

The second chapter, titled "Talking Like a Person," explores various layers of complexity in human conversation. These themes interweave like the conversation structures they describe:

  • Conversation is co-created: Participants collaborate to achieve a shared goal or outcome.

  • Prosody and intonation are fundamental to spoken language, forming part of its structure rather than merely being add-ons in dialogue construction.

  • Turn-taking is the interplay through which conversation forms, encapsulating power structures and much more than its mechanics initially suggest.

  • Conversation unfolds in a messy manner; it is structured but not always in a formal exchange of turn-taking.

  • Repair: We are constantly repairing our conversations, bringing them back to a point where the process of co-creation works. According to Nick Enfield, this occurs approximately every 84 seconds when two people speak.

  • Accommodation involves a chain reaction of adjustments in response to each other and the situation during conversation.

  • Mirroring, or "limbic synchrony," entails matching posture, expressions, and gestures, as well as speech elements like pace, vocabulary, pronunciation, and accents; this process is called convergence.

  • Code-switching requires presenting different identities to elicit an outcome.

  • Politeness goes beyond a list of social constraints such as not licking a bowl in a restaurant; in conversation, it serves as a contract between parties. When considered alongside the concept of repairing, it leads to more dynamic and fluid ideas, as expressed by Onuigbo G. Nwoye:

"It’s a series of verbal strategies for keeping social interactions friction-free."

The authors regard Grice's Maxims as a somewhat simplistic foundation for conversation, covering cooperative principles but missing some important elements of conversational theory and design addressed in the points above.

The Rest of the Book

I read this book as part of my research into linguistic user interfaces being built around LLMs and AI chat. While other chapters in the book contain a wealth of valuable material, I am only pulling out a few subjects that have specific interest to me at this time.

Common question types

In the section dealing with scripted flows, the author identifies some of the most common question types posed to users. The book outlines six common question types which form the construction of turn-taking in older conversational tools:

  1. Open-ended

  2. Menu

  3. Yes-or-no

  4. Location

  5. Quantifying

  6. Instructional

 There are a couple of useful UI elements that work with these questions: a conformational component and repeating request.

Cognitive load with task order, lists, and prosody

Ordering tasks is crucial for reducing cognitive load. This seems to be most related to the sequencing of instructions.

Spoken lists significantly increase cognitive load for users. The authors delve into detail on reducing complexity to enhance recall, which greatly impacts menu structures.

The absence of prosody in the simple conversion of written text to TTS (Text-to-Speech) significantly increases cognitive load. In the past, SSML (Speech Synthesis Markup Language) has been utilized and text has been scripted in a dialogue style.

"Human conversation is multimodal"

Human conversation is multimodal. This simple statement was one of the strongest messages I took away from the book. We operate by blending all our senses simultaneously to facilitate conversation flow. We employ visual body language along with prosody and the content of our words to communicate effectively.

Lisa Falkson of the Alexa team found that when users are presented with visual and audio information together, they often mute the audio to focus on the visual information.

You can still utilize visual and audio information if you are employing strong visualization reinforced by audio. Lisa uses Alexa’s weather as an example. If you do this, the elements need to be synced. Lisa refers to this as the "temporal binding window" of 400 milliseconds.

Follow up links and reading:

https://www.cambridge.org/core/books/using-language/4E7EBC4EC742C26436F6CF187C43F239
https://www.researchgate.net/publication/231870679
https://onlinelibrary.wiley.com/doi/book/10.1002/9781118247273
https://en.wikipedia.org/wiki/Turn-taking
https://en.wikipedia.org/wiki/Conversation_analysis
https://www.hachettebookgroup.com/titles/n-j-enfield/how-we-talk/9780465059942/?lens=basic-books

  • ux
  • design
  • conversational-design
‐ Also on: