Understanding The SGML Declaration

Understanding The SGML Declaration

[Table of Contents]

1. What Is An SGML Declaration?

Every SGML document defines a markup language in its prolog, and then uses that markup language, in its instance, to encode the information content of a document. Most SGML systems supply the "Reference Concrete Syntax" and the local systems character set (ASCII or EBCDIC) by default, and if these are adequate, no SGML Declaration is required. Using the Document Type Declaration alone, a large variety of powerful markup languages can be defined.

When implementing advanced text markup languages using SGML, the choice made by most SGML systems of a character set, of delimiters (such as "<" for opening a start-tag), of the maximum allowed length of names, of the characters allowed in names, and so on, will usually be found to be inappropriate or inadequate. The designer of an SGML-defined text markup language will wish to specify these values for individual markup languages. This can be done using the SGML Declaration, which, when present, precedes the prolog containing the document type declaration, and and document instance.

An SGML Declaration and a document type declaration provide the two components of a markup language:

The lexical structure of a document defines how the components of information it contains are recognized; the syntactic structure defines interrelationships between these components.

Things that will often need to be redefined in an SGML Declaration are:

  1. Adding to or replacing the short reference delimiter strings provided by the Reference Concrete Syntax.

  2. Changing the maximum length allowed for names (such as those of elements, attributes or entities), adding to the characters allowed in names (accented letters, for example), and controlling whether case is significant in names.

  3. Changing the maximum length allowed for literal strings (such as the values of internal entities or attribute names).

  4. Changing the allowed forms of minimization (for example, SHORTTAG could be disabled).

  5. Changing the character set recognized (for example, to allow EBCDIC-coded documents to be parsed on a system that usually uses ASCII).

If any of these changes is needed for a document, an SGML Declaration must be placed at the start of the SGML Document, with nothing preceding it except, optionally, spaces, tabs and line breaks.

The following chapters explain how the SGML Declaration works. Chapter 2, "Character Sets in the SGML Declaration", explains character sets in detail. It should be read as a basis for understanding the terminology and concepts discussed in chapter 3, "Syntax of an SGML Declaration", which describes how to write an SGML Declaration and the meaning of each of its parts. There are two annexes attached to this report, one providing a quick reference to the syntax of the SGML Declaration, and the other explaining the System Declaration.


[Next Chapter] [Table of Contents] ©Copyright Exoterica Corporation, All rights reserved.