[Accessibility-handlers] Unified Use Cases for Expert Handlers (Final?) Draft 2.04

Sun Feb 3 13:58:12 PST 2008

aloha, all!

apologies for the short turnaround time and notice, but the semi-final 
draft of the UUC document is available for final review at:

http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Draft2.04

and is included as plain text following my signature; note that i 
incorporated NS' indroductory paragraph, as submitted in:

https://lists.linux-foundation.org/pipermail/accessibility-handlers/2008-January/000153.html

and a revised Footnote 2, which may need to be reorderd so that the last
paragraph precedes the generalized list...

comments, corrections, criticism, and feedback of any and all types are
welcome -- please reply-to this post to log any of the aforementioned,
so that we can discuss and integrate them into the finalized draft...

thank you all for your participation, your input, your insights, and 
your prose -- i think we have a pretty tight document ready to be 
exposed to the wider world, but that'll be decided tomorrow at 1900h
UTC

http://www.linux-foundation.org/en/Accessibility/Handlers/Meetings/Agenda20080204.html

gregory.

---------------- BEGIN FINAL? UNIFIED USE CASES DRAFT ------------------

 Unified Use Cases for Expert Handlers (Draft 2.04)

   document status: internal draft -- this is a work-in-progress
   revision date: 2008-02-03

   previous versions: Draft 2.03b (2008-01-28), Draft 2.03 (2008-01-26), 
                      Draft 2.02b (2008-01-20), Draft 2.02a (2008-01-14), 
                      Draft 2.01 (2007-12-22)

   authors: Pete Brunet, Vladmir Bulatov, Gregory J. Rosmaita, Janina
            Sajka and Neil Soiffer (chair, Expert Handlers SIG)

   edited and annotated by Gregory J. Rosmaita
     _________________________________________________________________

   please provide feedback on this draft to the Expert Handlers SIG
   either via the Expert Handlers emailing list (preferred) or
   directly to this document's "Discussion" page to which posted
   comments will be appended by the editor. There is also a [1]Scratch
   Pad for the Unified Use Cases which serves as a collection point for
   issues and ideas related to expert handlers and its possible
   implementations.

     _________________________________________________________________

   Contents

        * 1. Unified Use Cases for Expert Handlers (Draft 2.04)
               + 1.1 Introduction: What Are Expert Handlers?
               + 1.2 Speech Output Use Cases for Expert Handlers
               + 1.3 Alternative Input Use Cases
               + 1.4 Navigability Use Cases
               + 1.5 Magnification Use Cases
               + 1.6 Braille Display, Embossing and Tactile Conversion Use
                     Cases
               + 1.7 Universal Use Cases
                        o 1.7.1 Universal Use Case 1: Where Am I?
                        o 1.7.2 Universal Use Case 2: Document Summary

        * 2. Putting It All Together: Expert Handlers and the Flow of
             Control

        * 3. Footnotes
     _________________________________________________________________

Introduction: What Are Expert Handlers?

   The purpose and responsibility of accessibility interfaces, such as
   Microsoft Active Accessibility (MSAA) and IAccessible2 (IA2), is to
   provide assistive technology (AT) with the ability to access and
   interact with the information contained in an application. This allows
   an AT to access the information in the application's DOM.
   Interpreting, displaying, and navigating the information is the
   responsibility of the AT. Generalized markup is sometimes complimented
   by markup specifications that facilitate more semantically precise
   content markup.

   Assistive technology typically handles generalized content markup, but
   does not know about specialized markup. Because of this, users of AT
   are unable to access or navigate specialized markup effectively.
   Generalized content markup (such as [2]HTML) is complimented by
   markup specifications that facilitate more semantically precise
   content markup. Examples of specialized, semantically precise markup
   include [3]MathML and [4]MusicXML.

   The Open A11y expert handlers SIG is exploring a standardized plug-in
   mechanism to AT software. The goal of this plug-in standard is to
   allow AT software to take advantage of expert software that
   understands specialized markup. This plug-in standard will allow the
   expert software to provide enhanced, semantically rich access to
   specialized markup so that the AT can properly render (visually,
   aurally, and/or tactilely) and help users navigate the semantic
   meaning encoded in the specialized markup.

   To better inform [delineate? specify?] what needs to be supported by
   an expert handler interface standard, the following sections discuss a
   number of use cases for an expert handler. The uses cases are divided
   into various functionalities such as [5]speech, [6]navigation, and
   [7]braille generation. The [8]last section discusses the options for
   how an expert handler might fit into the sequence of events that
   eventually results in a response to a user action.

 Speech Output Use Cases for Expert Handlers

   Computer users who are blind or severely visually impaired often use
   assistive technology (AT) built around synthetic text to speech (TTS).
   These AT applications are commonly called "screen readers." Screen
   reader users listen to a synthetic voice rendering of on screen
   content because they are physically unable to see this content on a
   computer display monitor.

   Because synthetic voice rendering is intrinsically temporal, whereas
   on screen displays are (or can easily be made) static, various
   strategies are provided by screen readers to allow users to tightly
   control the alternative TTS rendering. Screen reader users often find
   it useful, for instance, to skim through content until a particular
   portion is located and then examine that portion in a more controlled
   manner, perhaps word by word or even character by rendered character.
   It is almost never useful to wait for a synthetic voice rendering that
   begins at the upper left of the screen and proceeds left to right, row
   by row, until it reaches the bottom because such a procedure is
   temporally inefficient, requiring the user to strain to hear just the
   portion desired in the midst of unsought content. Thus, screen readers
   provide mechanisms that allow the user to focus anywhere in the
   content and examine only that content which is of interest.

   Screen readers have proven highly effective at providing their users
   access to content which is intrinsically textual and linear in nature.
   It is not hard to provide mechanisms to focus synthetic voice
   rendering paragraph by paragraph, sentence by sentence, word by word,
   or character by character.

   Access to on screen widgets have also proven effective by rendering
   that static content in list form, where the user can pick from a menu
   of options using up and down arrow plus the enter key to indicate a
   selection, in lieu of picking an icon on screen using a mouse.

   Access to content arrayed in a table can also succeed by allowing the
   AT to simulate the process a sighted user employs to consider tables.
   In other words, mechanisms are provided to hear the contents of a cell
   and also the row and column labels for that cell (which define the
   cell's meaning).

   Similar "smart" content rendering and navigation strategies are
   required by screen reader users in more complex, nonlinear content
   such as mathematical (chemical, biological, etc) expressions, music,
   and graphical renderings. Because such content is generally the
   province of knowledge domain experts and students, and not the domain
   of most computer users, screen readers do not invest the significant
   resources necessary to serve only a small portion of their customer
   base with specialized routines for such content. Furthermore, the
   general rendering and navigation strategies provided for linear
   (textual), menu, and tabular content are woefully insufficient to
   allow users to examine specific portions of such domain specific
   expressions effectively. On the other hand domain specific markup
   often does provide sufficient specificity so that the focus and
   rendering needs of the screen reader can be well supported.

   In order to gain effective access to such domain specific content
   screen reader users require technology that can:

     * Synthetically voice the expression in a logical order
     * Allow the user to focus on particular, logical portions of
       expressions possibly at several layers of granularity
     * Appropriately voice specialized symbols and symbolic expressions
     _________________________________________________________________

 Alternative Input Use Cases

   There are users with disabilities who do not require accomodation in
   order to read domain specific markup. Rather, these users require
   assistive technologies to facilitate their scrolling and/or editing of
   content. Highly effective assistive technologies exist to accomodate
   alternative input strategies ranging from:

     * speech input technology, such as [9]DragonDictate or
       [10]NaturallySpeaking;

     * mouse and keyboard alternative systems, such as the [11]GNOME
       On-Screen Keyboard (GOK), [12]Jambu, which provides improved web
       accessibility for switch and alternative pointer users, and
       [13]OpenEyes, an open-source open-hardware toolkit for real-time
       eye-tracking;

     * context-aware word-prediction technologies, such as [14]Dasher

   Users of alternative input assistive technologies require two specific
   accomodations for scrolling and editing domain specific content:

    1. Context aware expedited scrolling and navigation. The Navigation
       Use Cases outlined in this document will serve this requirement.

    2. Knowledge domain context aware command and content vocabulary for
       speech based navigation systems and for word-prediction systems.
     _________________________________________________________________

 Navigability Use Cases

   AT users need to be able to navigate within sub-components of
   documents containing specialized content, such as math, music or
   chemical markup. Typically these specialized components have content
   which needs to receive "focus" at different levels of granularity,
   e.g. a numerator within a numerator, an expression, a term, a bar of
   music, etc.

   Within each level, functions are needed in response to AT commands to
   inspect and navigate to and from "items" (e.g., by word, bar,
   expression, clause, term, depending upon the type of content being
   expressed) for a particular level of granularity:

    1. contextual query/inspection of object/glyph with current focus
       ("[15]Where Am I?")
    2. character-by-character
    3. previous/current/next item
    4. all items with user-defined characteristics
    5. all items in a author-defined category ([16]footnote 1)
    6. first/last item on a line
    7. first/last item within next higher or lower level of granularity
    8. first/last item in the document

   There are two scenarios to consider, a read-only scenario and a
   scenario where the user is editing the document.

   There are three system components that need to interact: the user
   agent, e.g. a browser, the AT, and the expert handler.

   In the read-only case, the AT responds to some sort of "Point of
   Regard" change event and depending on the "role" of the object which
   received focus, the AT fetches accessibility information pertinent to
   that role and then formats/outputs a response tailored to an AT user,
   e.g. TTS/braille. In the case of specialized content, an expert
   handler needs to be used by the AT because the AT doesn't know how to
   deal with such specialized content directly.

   In order to meaningfully interact with the specialized content, the
   user needs to be able to execute the following actions:

     * change level of granularity up/down
     * read all from top
     * read all from Point of Regard (POR)
     * goto and read first/last item on the current line
     * goto and read first/last item within the next less/more granular
       item
     * goto and read first/last item in the document
     * goto and read previous/current/next item

   In the case of editable content there may also be a desire to have
   separate cursors, e.g. one to remain at the POR (the caret, if
   editing), and one to move around for review purposes.

   The AT will already have UI input commands for most of the above
   functions, but probably not for changing to higher/lower levels of
   granularity. If the AT needs to provide the user with an increased
   level of granularity, in response, the AT would call the handler to
   change the mode of granularity. The AT will handle the UI commands and
   in turn call the handler to return an item at the current level of
   granularity. The AT would have told the handler about the output mode,
   e.g. braille or TTS. Armed with those three things: level of
   granularity, mode of output, and which item (first, last, previous,
   current, next), the handler knows what to do.

   In the case of editable content, the UA provides the input UI for the
   user. This editing capability would most likely be provided via a
   plugin. An example of such a plugin is needed so that one can evaluate
   what accessibility features need to be added to the existing editors.
     _________________________________________________________________

 Magnification Use Cases

   A common use of magnification is to proportionately enlarge content.
   For text-based (or more generally, font-based) applications, this
   means that AT software should be able to request rendering with larger
   sized fonts or a certain amount of magnification relative to some
   baseline magnification. Applications beyond standard text-based ones
   include math, music, and labeled plots/graphics. For non text-based
   applications such as graphics and chemical structures, magnification
   could be based on a certain percentage of the normal size or given by
   "fill this area". These two ideas can always be mapped onto each
   other. In all of these cases, the magnification may be due to having
   the entire documented magnified or it may be due to a request to
   magnify an individual instance (such as an equation).

   There are two other uses for magnification:

    1. While navigating or speaking, it might be desirable to magnify the
       part being navigated/spoken to make it easier to see. For example,
       while playing some music, the current measure and next measure
       might be magnified to ease reading while leaving the rest
       unmagnified so that the amount of screen space used is minimized.
       There also needs to be a method to reset the magnification.

    2. Math and Chemical notation shrink fonts for superscripts and
       subscripts. In math, these are further reduced for nested scripts.
       One common feature for math renderers is to set a minimum font
       size. Typically, this is 50% of the base font size and corresponds
       to the size used for doubly nested scripts. It is potentially
       useful to allow the AT to control the maximum percent shrinkage
       used by renderers. Another possibility is to have a feature that
       says "don't shrink at all". Although the rendering would not be
       consider high quality typesetting, it does make scripts more
       readable to those with some vision impairment.
     _________________________________________________________________

 Braille Display, Embossing and Tactile Conversion Use Cases

   An Expert Handler should be able to provide braille data for braille
   display output by generic AT. Custom braille output is needed, because
   generic AT has no knowledge about how specific specialized data can
   and should be represented via braille. An example is mathematics:
   there are many different braille codes used to represent mathematics
   that vary from country to country and agency to agency, which affects
   how [17]MathML is translated into a tactile form.

   Simple ASCII strings are normally used to communicate braille to
   braille devices. However, there are a lot of specific ASCII-to-dots
   pattern-encoding tables used to generate braille that conforms to a
   natural language's braille conventions. Therefore AT and expert
   handler have to negotiate braille table to be used. A more universal
   approach would be use special braille Unicode symbols in range from
   0x2800 to 0x28FF.

   There is also a need to have braille output tailored to various level
   of granularity. For example, at a low level of granularity, the user
   would receive an overall description of the mathematical expression or
   image, while at the highest level of granularity, the user would
   receive a complete braille translation of the whole math expression or
   list of all labeled components of the image.

   Some data may need to be expressed in a more advanced tactile output
   format than refreshable braille. For example, graphical data would
   greatly benefit from being embossed on paper or a 2D braille display.
   Input devices, such as a touchpad or camera, which allow a user to
   communicate to the computer which parts of the graphic the user is
   interested in and needs to be tactilely displayed. Such interactive
   functionality should be left exclusively to the expert handler. This
   means that an expert handler must have an interactive mode and a way
   for an AT to trigger/toggle this mode on. In such a mode, an AT should
   also provide a way for the expert handler to produce more than one
   output stream -- such as simultaneous speech and braille output --
   directly via an AT device which uses the same TTS engine and/or
   braille display.
     _________________________________________________________________

 Universal Use Cases

  Universal Use Case 1: Where Am I?

   The user must have a means of obtaining all available information
   about the object/character with focus, beginning with the repetition
   of the character or the programmatic binding which describes the
   object with focus. The ability to query the AT to determine one's
   point of regard within a document and within containers in the
   document is essential. The user must be able to obtain information
   about the current point of regard at from most generic level -- what
   percentage of the document or section has been read, how much of the
   document or section remains to be read -- to the most atomic.
   Therefore, an AT must create a User Interface where successive "Where
   Am I?" queries by the user generate more verbose or more terse
   responses. ([18]footnote 2)

  Universal Use Case 2: Document Summary

   A user may find it necessary to consult a "Document Summary",
   containing a list of the types of elements and containers in the
   document. The user needs to know the document title and language as
   well as the number of tables, links, headings, frames, forms,
   controls, items, images, and pages. The application may implement a
   document summary feature natively through its own UI instead of an
   accessibility API, but in the case of specialized markup, may need the
   assistance of an expert handler in order to present an appropriate
   document summary for the content being summarized.
     _________________________________________________________________

 Putting It All Together: Expert Handlers and the Flow of Control

   The goal of the [19]Expert Handlers working group is to define a
   standard so that AT software can call on expert software to interpret
   specialized markup. One issue that needs to be addressed is how and
   where (in the flow of control of reading a page) should the expert
   handler get invoked. Here are three possibilities:

     * During installation, the expert handler registers itself with the
       rendering application (e.g., the web browser, PDF viewer, etc.).
       When the page is loaded, the handler is invoked by the renderer to
       convert the DOM or some proxy for the DOM node to make it appear
       to have non-expert content. For example, it might convert the
       specialized markup to text or some generalized markup that AT can
       typically handle.
     * The AT traverses the DOM and when it gets to some node it doesn't
       understand, it consults some resource that associates a particular
       handler with the node name. It gets the node's content (which
       might include other nodes) from the DOM and passes that content to
       the expert handler. It then issues requests to the handler (e.g.,
       "give me text to speak for the content").
     * The AT traverses the DOM and when it gets to some node it doesn't
       understand, it consults some resource that associates a particular
       handler with the node name. It then points the expert handler to
       that node and issues requests to the handler. In this case, the
       handler is directly interacting with the DOM.

   Although similar, the later two cases probably have implications on
   the difficulty of implementation and the capabilities of the
   interface. Some of these are:

     * If the expert handler directly reads the DOM, then the expert
       handler must understand MSAA, [20]IA2, or whatever is appropriate
       for the level of functionality it needs. This also implies that
       the rendering application must support those standards. If not,
       the expert handler would need to know how to access application
       specific DOMs.
     * If the expert handler is given a copy of what resides in the DOM,
       then interacting with the content (e.g., filling in a text field)
       would complicate any standard that is developed because support
       for passing info about input would need to be part of the
       standard.
     _________________________________________________________________

 Footnotes

   Note 1. for example, the FIELDSET, LEGEND, LABEL grouping and
   labelling mechanisms for FORM controls or the headers/id relationship
   defined for TABLE in [21]HTML 4.01/[22]XHTML 1.0 or the [23]ARIA
   markup "[24]labelledby " and "[25]describedby"
     * [26]return to the text following note 1

   Note 2. For each potential point of regard possible in a specific
   Generalized Markup Language, the AT requires, and can usually obtain 
   from the document's structure and semantics, as reflected in the DOM, 
   the following element characteristics, if they exist, depending on the
   type of elements in the item at the current POR:

     * For all locations:
         1. Number of items in the document
         2. Relative item number (n of total) within the document
         3. Document title

     * Table information if in a table:
         1. Caption and table summary
         2. Content for row and column headers
         3. Relative number (n of total number) for the table in the
            document
         4. Relative row and column number (x of total, y of total)
            within parent table
         5. Table type/role (data, spreadsheet, calendar)

     * Section information if in a section:
         1. Section type (page, frame, heading)
         2. Section title
         3. Relative number (n of total) for the section type in the
            document
         4. Level if in a section with a heading
         5. Relative item number (n of total) within the section

     * Form control information if on a form control:
         1. Group label for a control (such as LEGEND or OPTGROUP in
            HTML) if in a group
         2. Label or alternative text (such as title or alt in HTML or
            title and desc in SVG)
         3. Type of form control (role)
         4. State
         5. Relative number (n of total) of the parent form in the
            document
         6. Relative form control number (n of total) within the parent
            form

     * Map information if in a map:
         1. Relative area number (n of total) within the areas of a map
         2. Title attribute for map

     * List or menu information if within a menu or list:
         1. Type (role) - menu, simple list, definition list, ordered
            list, folder, navigation bar, and so on
         2. Title from parent menu or list
         3. Relative number (n of total) of parent list or menu in the
            document
         4. Relative list item number (n of total) within the list

     * Link information if on a link:
         1. Relative link number (n of total) within the document
         2. Link state: visited, unvisited, active, focused, external or
            internal
         3. Extended information, such as that provided by the title
            attribute

   For [27]Specialized MarkUp Languages, the above list of points of
   regard needs to be broadened and abstracted into a context meaningful 
   to the content and structure achieved through the use of a particular 
   specialized markup language. For example, a musical score marked up in 
   an XML-derived dialect, would frame its points of reference in a manner 
   conformant with the structure of the content being accessed: by stanza, 
   by bar, by note, and so on. The level of granularity necessary to 
   provide meaningful interaction between the user of an AT and a specific 
   markup language is highly dependent upon the type of specialized 
   content being described, as well as the parameters and structures 
   inherent to the specialized knowledge domain for which the specialized
   markup language has been designed. Expert handlers, therefore, need the 
   ability to cache ontologies specific to each type of specialized 
   content, in order to enable full interactivity with the specialized 
   content. Such ontologies can be provided through either the [29]Web 
   Ontology Language (OWL) or the [30]Resource Description Framework (RDF) 
   in order to provide an assistive technology with meaningful, and 
   appropriately structured, API calls and mappings.

     * [31]return to the text following note 2
     _________________________________________________________________

References

   1. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/ScratchPad
   2. http://www.linux-foundation.org/en/Accessibility/Handlers/References/GMLs#html-info
   3. http://www.linux-foundation.org/en/Accessibility/Handlers/References/SMLs#mathml-info
   4. http://www.linux-foundation.org/en/Accessibility/Handlers/References/SMLs#musicxml-info
   5. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Draft2.04#speech-output
   6. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Draft2.04#nav
   7. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Draft2.04#braille
   8. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Draft2.04#flow
   9. http://www.ddwin.com/dictate.htm
  10. http://www.ddwin.com/deluxe.htm
  11. http://www.gok.ca/
  12. http://www.oatsoft.org/trac/jambu
  13. http://www.oatsoft.org/Software/open-eyes
  14. http://www.inference.phy.cam.ac.uk/dasher/
  15. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Draft2.04#Universal_Use_Case_1:_Where_Am_I.3F
  16. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Draft2.04#fn1
  17. http://www.linux-foundation.org/en/Accessibility/Handlers/References/SMLs#mathml-info
  18. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Draft2.04#fn2
  19. http://www.linux-foundation.org/en/Accessibility/Handlers
  20. http://www.linux-foundation.org/en/Accessibility/IAccessible2
  21. http://www.w3.org/TR/html401
  22. http://www.w3.org/TR/xhtml10/
  23. http://www.w3.org/TR/aria-roadmap
  24. http://www.w3.org/TR/aria-roles/#labelledby 
<!-- change to: http://www.w3.org/TR/wai-aria/#labelledby when goes "live" -->
  25. http://www.w3.org/TR/aria-roles/#describedby
<!-- change to: http://www.w3.org/TR/wai-aria/#describedby when goes "live" -->
  26. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Draft2.04#fn1-back
  27. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Accessibility/Handlers/References/SMLs
  28. http://www.w3.org/TR/owl-ref/
  29. http://www.w3.org/TR/REC-rdf-syntax/
<!-- should link 29 link to the RDF spec or the RDF activity at http://www.w3.org/RDF -->
  30. http://www.linux-foundation.org/en/Accessibility/Handlers/UseCases/Unified/Draft2.04#fn2-back