Does audio narration help eLearners?

Published on:

June 1, 2017

Read time:

6 minutes


Unpacking the value add of audio narration in online learning

Is more media always better?

eLearning courses frequently present the same content in different forms. Typically, audio voiceover will also be displayed as text. Sometimes the learner has an option to hide the text and listen to the audio, or mute the audio and read the text, choosing one sensory channel over the other. But often the text is displayed along with the audio whether the learner wants to experience both or not.

Of course, being able to read close-captioned text is crucial for learners who have hearing impairments, just as being able to listen to the content is essential for learners with visual impairments. But there is also the misconception that everyone benefits from multi-channel presentation: if receiving the information through one channel is good, receiving it through multiple channels must be better. It is widely thought that redundancy helps fill in missed signals. If learners have missed visual text content, they could have received the audio content successfully, and vice versa.

What the research says

Unfortunately, this folk wisdom isn’t supported by either experience or research. Remember the last time you sat through a slide presentation where the presenter read the text from the slides, almost word-for-word? Chances are you would have preferred either reading the content at your own pace (online or in hardcopy), or if the presenter had just spoken the text content and used the slides for complementary and supporting graphics.

The research is based on two theories. The first, cognitive load theory (see Sweller & Chandler, 1994 to learn more) states that our working memory may be overburdened when multiple sources of information must be assimilated simultaneously. Generally, increasing the cognitive load from multiple, redundant information channels, decreases learning efficiency.

The cognitive theory of multimedia learning suggests that there are two information channels, visual and auditory, that can process in parallel streams. The visual channel is most efficient at processing graphic images. Spoken or narrated text is processed through the auditory channel. A visual graphic, when accompanied by a spoken, auditory explanation activates the parallel processing power of the two channels, leading to better learning. But if onscreen text and graphic images are presented in the visual channel, the capacity of the visual channel is exceeded, and meaningful learning is much less likely to occur. (see Mayer, Heiser, & Lonn, 2001 to learn more).

The situation is further complicated by the fact that there are different types of learning, some easier than others. The easiest form of learning is fairly superficial. We can recognize or identify something we’ve learned, or retain certain facts. For example, many people can correctly identify Julius Caesar as an ancient Roman emperor—an example of learning a fact for retention. Deeper learning, that is, being able to analyze and apply learning to new situations, is more difficult. Far fewer people would be able to discuss Caesar’s role in the collapse of the Roman Republic and his influence on later European politics. That is an example of deep learning, which takes much more effort.

In addition, most of us are not very good at judging how well we’ve learned something. We tend to overestimate how much we comprehend based on how familiar we are with the topic. The more times we’re exposed to a subject, the better we think we understand it. We often, and mistakenly, believe that awareness is the same as deep understanding. (see Fenesi & Kim, 2014 to learn more).

In an eLearning course, when onscreen text is also narrated, we are exposed to the same material twice: once through the visual channel and once through the auditory channel. If our learning task is only at the retention level, then the duplication of material is slightly beneficial, and we remember a little bit more than if we had experienced the content in only one channel.

However, if the learning task requires deeper learning that can be applied to new situations, the redundancy of both visual and auditory presentations actually impedes learning. The kicker is that because we generally don’t evaluate our own learning accurately, we think we’ve learned the course material, but we really haven’t, or we’ve learned it only at a superficial level. If the course evaluation is also at the superficial level, testing retention of facts rather than the application of the course material, the learner might do well on the evaluation but not be able to transfer the learning to the job.

Engaging, interactive and effective eLearning

This is a critical distinction in the field of workplace training. Most organizations want their staff not just to remember certain facts, but to apply skills and information on the job in dynamic situations. They want eLearning that is engaging and interactive so that learners will be able to do their jobs more safely and productively. Duplicating narration and onscreen text defeats this purpose.


Clark, R.C., Nguyen, F. and & Sweller, J. (2006). Efficiency in learning: evidence-based guidelines to manage cognitive load. San Francisco: Wiley.

Fenesi, B. and Kim, J.A. (2014). Learners misperceive the benefits of redundant text in multimedia learning. Frontiers in Psychology, 5, 1-7.

Mayer, R.E., Heiser, J., and Lonn, S. (2001). Cognitive constraints on multimedia learning: When presenting more material results in less understanding. Journal of Educational Psychology, 93(1), 187-198.

Sweller, J. and Chandler, P. (1994). Why some material is difficult to learn. Cognition and Instruction, 12(3), 185-233.

Share this