cool hit counter A study of the "waiting experience" in speech interaction_Intefrankly

A study of the "waiting experience" in speech interaction


Looking back at the history of human-computer interaction, humans have experienced the command-line based CLI era, the mouse-keyboard based GUI era, and the touch-based primary NUI era. Each of the latter stages is more natural, less expensive to learn and more efficient overall than the previous one.

Entering the AI era, artificial intelligence brings three capabilities to machines: perception, cognition, and natural language output. Perceptual capabilities enable machines to understand human language, cognitive capabilities enable machines to think about how to answer human questions, and natural language output capabilities enable machines to express themselves as humans do - the combined use of all three capabilities brings human-computer interaction to the stage of voice interaction. Voice interaction is the most natural way of human-computer interaction, it greatly reduces the learning cost when people interact with machines, brings the comprehensive efficiency of human-computer interaction to a new level, and has become a very important way of human-computer interaction.

I. The "waiting experience" - one third of the voice interaction experience

In a human conversation scenario in life, the conversation consists of a continuous cycle of "saying a sentence to the other person", "waiting for the other person to reply", and "the other person giving a reply". Waiting for a response" is a "third" of the conversation experience and has a direct impact on the satisfaction of the response. During the waiting phase for a response, if the other person is in a state of serious thought, it makes us feel valued; however, if the other person's attention is not on the conversation itself during the waiting process, we will be suspicious even if the response they give is good.

The "waiting experience" is also in the middle of the whole experience cycle chain and plays an important role in the voice interaction experience. However, the "waiting experience" in voice interaction has not been systematically studied in the industry and is still in a vague state.

1. Must the response time be as short as possible?

Dynatrace, a digital performance management platform, has studied the behavior of users browsing the web and found that when a web page loads 0.5 seconds faster, it can drive a 10% increase in the core data of user behavior conversions on a website. Therefore, minimizing wait times in web design and app design is a relentless pursuit in product design.

Unlike visual-based interaction, voice interaction naturally comes with emotional attributes. However, the experience of emotion is complex, and it is not controlled by only a single variable, efficiency. In most cases, when people talk to each other in life, an answer that is too fast gives the user a sense of frivolity and robbery, while an answer that is too slow gives the user a sense of sluggishness and foolishness.

So, what response times actually make for the best experience in voice interaction? What are the trends in response time experience?

2. What variables affect the waiting experience?

In the field of visual design, when designing the loading state of a page, in order to reduce the user's bounce rate, designers often eliminate the user's uneasiness by giving a progress bar, or by using a fun and emotional design.

But in the field of voice interaction, the carrier of speech is intangible, or indeterminate in form, and we don't even have an interface that carries the loading state. And what variables affect the waiting experience in this case? What is the extent of the impact?

In summary, it can be said that in the field of voice interaction, although the waiting experience is important, it is still a "fog". In view of this, we take the current main carrier of voice interaction, smart speaker products, as an example, to conduct a thematic study on the issue of waiting experience in AI products.

Second, a study of the waiting experience of smart speakers

Current smart speakers, mainly use the voice interaction process of waking up by voice first and then inputting commands. With this in mind, we can divide the process of using a smart speaker into two main stages.

1) Wake-up phase: the user converts the speaker from the waiting state to the ready state through the specified wake-up word, and the speaker is woken up before it can receive the user's voice commands.

2) User request and feedback stage: the user gives voice command content and the feedback result from the smart speaker to meet the user's needs.

For these two phases, we studied them successively through the following three experiments.

Experiment 1: The effect of response time in the wake phase on the waiting experience;

Experiment 2: The impact of response time on the waiting experience in the user request and feedback phases.

Experiment 3: The effect of different feedback methods, such as visual and audio, on the waiting experience.

We elaborate on the findings of each experiment in turn below.

Experiment 1: The effect of response time in the wake phase on the waiting experience

To comprehensively examine the impact of various factors in the wake up phase on the waiting experience, in our experiments, we provided users with smart speakers with different wake up response times and different wake up feedback methods. After completing the experimental task, the user was required to rate the wakeup response speed of the speaker (5-point scale: too fast to accept; a little fast to accept; just right; a little slow to accept; too slow to accept).

The results of Experiment 1 indicate that the optimal wake-up response time is related to the wake-up feedback method, and that the optimal response time varies for different wake-up feedback methods.

1)When the wake-up feedback is" illumination" When giving feedback, The faster the wake-up response, the better, (located) at200ms time, Highest user response comfort( Proportion of users who rate response time as just right),73% of users are satisfied with the speed。

2)When the wake-up feedback is" illumination+ sound effects" time, The comfortable time for wake-up response speed is300ms left and right,76% of users are satisfied with the speed。

3)When the wake-up feedback is" illumination+ vocal" time, The comfortable time for wake-up response speed is500ms left and right,74% of users are satisfied with the speed。

(Note: This experiment sets up three mainstream wake-up feedback methods on the market: light, light + sound, light + voice, in order to give different feedback situations for the response time feeling for reference, but the optimal feedback method, in addition to the response time is also affected by other factors, will be discussed in a separate chapter. )

Experiment 2: Impact of Response Time on Waiting Experience in User Request and Feedback Phase

Since the response of the user request and feedback phase differs from the response of the wakeup phase in terms of technical implementation and user expectations, we investigate the optimal response time range for the user request and feedback phase through a second experiment. In our experiments, we provided users with smart speakers with different response time settings.

The main findings of Experiment 2.

1) Up to 1250ms is the interval where users consider the response speed to be superior, with 650ms being the best experience value. At 450ms, a small number of users find the response speed too fast and users feel a sense of urgency and stress that is difficult to accept.

2) At 1450ms, 53% of users start to feel a delay in response, but it is still acceptable.

3) Starting at 2150ms, 20% of users found the speaker response to be too slow to be acceptable. We don't think 20% of our users being dissatisfied is enough to be called a great product anymore.

Experiment 3: The effect of different feedback methods, such as visual and audio, on the waiting experience

Since the response time of smart speakers in the request feedback phase in the current market is generally above 1.5 seconds, it does not reach the ideal response interval studied in Experiment 2. Therefore, we further investigate the effect of feedback modality design on user response speed perception through Experiment 3, where we provide users with five sets of scenarios with different feedback modality designs.

In each of the five scenarios of Experiment 3, different response time settings were made for each scenario.

The main study in Experiment 3 found that different feedback modality designs affect people's perceptions of speaker responsiveness: the

1) Within 1250ms, Scheme D feels worse, vocal feedback creates a feeling of robbing words, and some users think the speakers respond too quickly.

(2) From 1350ms to 2150ms, the proportion of users who feel comfortable with Scenarios D and E is higher. The addition of human voice/sound effects, such as the voice response "OK" in Scenario D, helps to alleviate the user's latency perception and improve the speed perception experience.

3) At response times of 3150ms and above, the response method design is no longer effective in mitigating latency and such cases should be avoided as much as possible.

In addition, Experiment 3 also found that response speed was expected to be related to user gender and task type. Female users are less tolerant of response time than men, and the average length of response time they can tolerate when there is no feedback from the speaker at longest is lower than that of men, i.e., they expect response feedback from the speaker in a shorter period of time.

Users have less tolerance for response time in control tasks than in tasks such as music and quizzes, and users want to have more timely response feedback in control tasks.

III. Summary

In this paper, we discuss the waiting experience in voice interaction and conduct an experimental human ergonomics study focusing on the response time and feedback methods in the wakeup phase and the request feedback phase, using a smart speaker as an example. Due to limitations such as the conditions set in the experiment and the sample size, the experimental findings may not represent the full range of feelings of all smart speaker users in the home environment, but it is hoped that our research and exploration can guide the design of response times and feedback methods for AI voice dialogue products to help create a natural and ultimate voice dialogue experience.


Recommended>>
1、The most complete regular expressions II
2、Why are all bank card PINs 6 digits The answer is actually
3、Online lending platforms have been running away and two tricks can ease your moment
4、This company is a potential rival of Microsoft annual profits of 933 billion base layout in ten countries
5、Academician Dai Qionghai Vigorous development of threedimensional display and other key technologiesnbsp Helping Virtual Augmented reality moves into a new era

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号