Media: Captions and Transcripts

Sometimes, simple text on the page — and even static images — is not enough. Video is becoming more and more important for web pages of all kinds, and that isn’t about to change anytime soon. Understanding how to present video (and audio) in a responsible, accessible way, is critical for content creators.

Transcripts

Sometimes the simplest, most successful alternative way to make content in a video or audio file accessible is to provide a transcript of the content. This is especially true in the case of audio files — since an audio file has no visual presentation to connect captions to, audio file accessibility almost always requires a transcript. This might be presented on the page itself, or as a link to an external document that users who need it (or want it) can access.

A complete transcript should include all of the important information conveyed in the audio or video file. As a baseline, it should contain all of the words that are spoken in the video. At the same time, in the case of video, if there is visual content that is important to a user’s understanding of the video, that should also be adequately conveyed in the transcript.

Captions

Captions are an excellent alternative to transcripts in videos in many cases. They make the video itself something that is perceivable for a user who can’t hear the audio track. Text is presented on the screen in blocks that roughly align with when the words are spoken in the video.

Many video hosting providers — like Youtube — provide free automatic captions for video. This can be a big help for users who depend on the captions, but these captions are not adequate, at this time, to meet all the needs that good captions should meet.

Captions should be accurate, should indicate different speakers, and present appropriate punctuation in addition to the words that are spoken. So, it’s important to review and edit captions to make sure they’re complete and accurate before using that video on your web page.

Audio Description

Captions obviously are an effort that solves problems presented by video for one set of users — users who cannot hear the audio. However, other used have other problems with video that are equally important. Users who cannot see the video need to be able to understand what is going on visually as they listen to the audio track.

Audio description in video is more challenging that captions at this time. While there are automatic systems for generating captions that may be somewhat accurate, computers are not yet smart enough to identify the important parts of the action to automatically describe that to a user.

Also — many of the popular video platforms that we have today, like Youtube, do not yet have the capability to create and present an additional, optional audio track that would add descriptions of the action to the existing audio track.

If the video is only of people speaking, it may not be necessary to provide an audio description — the visual part of the video adds nothing to the information provided

There are a handful of ways to try to address this challenge:

Provide a transcript (see above)
Ensure that the audio track describes everything going on in the visual field (in the case of a video of a powerpoint presentation, the speaker could make sure to describe all of the images on the slides while going through the presentation)
Provide a secondary video with an audio description added to the existing audio track

In the end, the simplest and most effective way to make sure your users are getting the information they need is to also present that information in a textual way. That can be time consuming and challenging.

Some video platforms that generate automatic captions make it possible to export the captions as a single file, which may be a good place to start in the production of a transcript.

A note on Caption and Transcript Accuracy

Companies that provide human captioning services for video like to talk about caption accuracy in terms of a percentage that is acceptable — and it’s satisfying to think about being able to guarantee something like 99% accuracy.

However, the standards for caption accuracy that exist (which come down from the FCC) do not establish a true percentage of accuracy that is acceptable. The FCC requires that captions be accurate, include appropriate punctuation, and that they identify speakers as necessary. The implication of the standard that captions be accurate and not 99% accurate (or some other value) is that in practice, no amount of inaccuracy is technically acceptable. Mistakes will happen, but those standards underline the importance of being sure that people who know the subject matter review all captions to make sure they’re as accurate as we can make them.

Previous Topic

Back to Lesson

Next Topic