[Accessibility-ia2] media a11y

Thu Jun 16 15:35:08 PDT 2011

Hi Pete,

Thanks for your review and questions on the video side of things. I'm
hoping the combined expertise here will be able to define the best way
to deal with video and the track specification of HTML5 [1]. I may,
however, need to go into detail on how tracks and cues are handled in
HTML5 before we can come to the right solution.

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-track-element

On Fri, Jun 17, 2011 at 5:36 AM, Pete Brunet <pete at a11ysoft.com> wrote:
> I've read through the discussion and have these comments and questions:
>
> What is the use case to justify an API (and associated synchronization
> complexity) for access to cues that is not solved by captions for those who
> can't hear the audio and audio descriptions of visual media for those who
> can't see the video?

The <track> element in HTML5 allows association of external text files
that provide lists of timed caption cues, subtitle cues, text
description cues and chapter cues to audio and video elements. This is
on top of what can come from within a audio or video file, which can
contain captions and subtitles as text, as well as audio descriptions
as audio and sign language as video.

Why I approached Alexander was to find out how to deal with text
descriptions. Text descriptions are something new that you may not
have seen in traditional accessibility approaches for audio and video:
they provide the text that is usually spoken in audio descriptions as
actual timed text cues. The files are essentially the same as caption
files with cues that have a start and an end time and some text.
However, it is expected that these text descriptions are read out by a
screen reader or handed to a braille device to be communicated to
those who can't hear.

In addition, it should probably be possible to also expose caption
cues (and subtitle cues for that matter) to AT for those that can
neither hear nor see and want to consume them through braille. This
was, however, not my main use case.

Note that none of the text cues are part of the DOM of the Web page
but only live in the shadow DOM. Therefore, I guess, some method of
exposure is required.

> Since it's early in the discussion of this issue I think this topic needs to
> be separated from the rest of the discussion.  Alex can you move that to a
> separate section like you did for the Registry API?
>
> At least at this point I'm not in favor of the media control methods.
> Developers should provide accessible GUI controls.  The developer would have
> to implement the access in any case and having access through the GUI would
> eliminate adding the code for these new methods on both sides of the
> interface.  If the app developer does a correct implementation of the GUI
> there would be no extra coding required in ATs.

I guess the idea here was that there may be situations where AT needs
to overrule what is happening in the UI, for example when there are
audio and video resources that start autoplaying on a newly opened
page. However, I am not quite clear on this point either.

The key problem that I saw with text descriptions and video controls
is that we have quite a special case with text descriptions since the
author of the text descriptions can identify the breaks in the video
timeline into which a description cue needs to be fitted, and they can
provide the text that needs to be spoken in this break, but they
cannot know how long it will take to actually voice or braille this
text. Therefore, AT in this case needs to control the video's playback
timeline and possibly put it on hold when the end time of the cue is
reached until AT has finished with the text of the cue. I would think
that this may be one of the only cases where AT actually has to
control the display of the Web page rather than just being a mere
observer.

Best Regards,
Silvia.