[Previous Chapter] [Table of Contents]
This section contains three examples of SGML Declarations. The first illustrates using various parts of the SGML Declaration, the second illustrates supporting character sets other than ASCII, and the third illustrates the flexibility of SGML as a grammar-definition tool.
The following is an example SGML Declaration that illustrates many of the things that can be done. It is rather a "hodge podge" of definitions and it is not intended for any purpose other than illustration.
Practical SGML Declarations vary as little as possible from the reference versions and only use the features and capabilities needed for a particular application.
<!SGML "ISO 8879:1986"
CHARSET
BASESET "ISO 646-1983//CHARSET International Reference
Version (IRV)//ESC 2/5 4/0"
DESCSET
0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 96 32
-- Allow character 127 in documents as well.--
128 127 "High-order characters"
-- High-order characters are data. --
255 1 UNUSED -- Except that #255 is non-SGML. --
CAPACITY SGMLREF
TOTALCAP 50000 -- Set all the capacities to 50000. --
ENTCAP 50000
ENTCHCAP 50000
ELEMCAP 50000
GRPCAP 50000
EXGRPCAP 50000
EXNMCAP 50000
ATTCAP 50000
ATTCHCAP 50000
AVGRPCAP 50000
NOTCAP 50000
NOTCHCAP 50000
IDCAP 50000
IDREFCAP 50000
MAPCAP 50000
LKSETCAP 50000
LKNMCAP 50000
SCOPE INSTANCE -- The concrete syntax only applies
to the document instance. --
SYNTAX
SHUNCHAR 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
-- Shun just the above base characters. --
BASESET "ISO 646-1983//CHARSET International Reference
Version (IRV)//ESC 2/5 4/0"
DESCSET
0 128 0
FUNCTION
-- All the reference function characters. --
RE 13
RS 10
SPACE 32
TAB SEPCHAR 9
FF SEPCHAR 12
-- Except that form-feed is also white-space. --
DEL FUNCHAR 127
-- And DEL is an inert function. --
NAMING
LCNMSTRT "@$"
-- @ and $ can appear in names and can start them. --
UCNMSTRT "@$"
-- They are their own upper-case forms. --
LCNMCHAR ""
-- No other name characters, not even . and -. --
UCNMCHAR ""
NAMECASE GENERAL NO
ENTITY NO
-- Case is significant in all names. --
DELIM
GENERAL SGMLREF
-- Redefine three General Delimiters. With the following
definitions, declarations are entered as <!!! ... !!!>
instead of the usual <! ... > and comment declarations are
entered as <!!!*** ... ***!!!>.
--
COM "***"
MDO "<!!!"
MDC "!!!>"
SHORTREF NONE
-- Define only two Short Reference Delimiters. They could
be used as escape sequences for < and &.
--
"\<"
"\&"
NAMES SGMLREF
-- Change two of the keywords used in marked sections. --
IGNORE SKIP
INCLUDE DONTSKIP
QUANTITY SGMLREF -- Change NAMELEN and LITLEN. --
NAMELEN 32
LITLEN 2048
FEATURES
MINIMIZE DATATAG YES OMITTAG YES RANK YES
SHORTTAG YES
LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
OTHER CONCUR NO SUBDOC YES 1 FORMAL YES
APPINFO "WARNINGS YES"
-- Pass "WARNINGS YES" to the application. --
>
Support of a non-ISO 646 character set does not require a change in the concrete syntax used: only the document character set definition needs to be changed. For example, the following document character set would serve for parsing a document coded in the EBCDIC character set.
EBCDIC has no "[" and "]" characters and has two extra characters: the cents symbol and the "not" symbol. The document character set solves this difficulty by simply assigning the EBCDIC cents symbol the meaning of "[" and assigning the EBCDIC "not" symbol the meaning of "]". All other EBCDIC characters are assigned the meanings of the corresponding characters from ISO 646.
CHARSET
BASESET "ISO 646-1983//CHARSET International Reference
Version (IRV)//ESC 2/5 4/0"
DESCSET
0 5 UNUSED
5 1 9 -- TAB (EBCDIC HT) --
6 7 UNUSED
13 1 13 -- RE (EBCDIC CR) --
14 23 UNUSED
37 1 10 -- RS (EBCDIC LF) --
38 26 UNUSED
64 1 32 -- SPACE --
65 10 UNUSED
74 1 91 -- [ (EBCDIC "cents" symbol) --
75 1 46 -- . --
76 1 60 -- < --
77 1 40 -- ( --
78 1 43 -- + --
79 1 124 -- | --
80 1 38 -- & --
81 9 UNUSED
90 1 33 -- ! --
91 1 36 -- $ --
92 1 42 -- * --
93 1 41 -- ) --
94 1 59 -- ; --
95 1 93 -- ] (EBCDIC "not" symbol) --
96 1 45 -- - --
97 1 47 -- / --
98 9 UNUSED
107 1 44 -- , --
108 1 37 -- % --
109 1 95 -- _ --
110 1 62 -- > --
111 1 63 -- ? --
112 9 UNUSED
121 1 96 -- ` --
122 1 58 -- : --
123 1 35 -- # --
124 1 64 -- @ --
125 1 39 -- --
126 1 61 -- = --
127 1 34 -- " --
128 1 UNUSED
129 9 97 -- abcdefghi --
138 7 UNUSED
145 9 106 -- jklmnopqr --
154 7 UNUSED
161 1 126 -- ~ --
162 8 115 -- stuvwxyz --
170 22 UNUSED
192 1 123 -- { --
193 9 65 -- ABCDEFGHI --
202 6 UNUSED
208 1 125 -- } --
209 9 74 -- JKLMNOPQR --
218 6 UNUSED
224 1 92 -- \ --
225 1 UNUSED
226 8 83 -- STUVWXYZ --
234 6 UNUSED
240 10 48 -- 0123456789 --
250 6 UNUSED
The following is an SGML Document Entity that contains the grammar for, and an example of, a small subset of Rich Text Format, a word-processor interchange language developed by Microsoft.
<!SGML "ISO 8879:1986"
CHARSET
BASESET "ISO 646-1983//CHARSET International Reference
Version (IRV)//ESC 2/5 4/0"
DESCSET
0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
CAPACITY PUBLIC "ISO 8879-1986//CAPACITY Reference//EN"
SCOPE INSTANCE
SYNTAX
SHUNCHAR 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
BASESET "ISO 646-1983//CHARSET International Reference
Version (IRV)//ESC 2/5 4/0"
DESCSET
0 128 0
FUNCTION
RE 13
RS 10
SPACE 32
TAB SEPCHAR 9
NAMING
LCNMSTRT ""
UCNMSTRT ""
LCNMCHAR "-."
UCNMCHAR "-."
NAMECASE GENERAL NO
ENTITY NO
DELIM
GENERAL SGMLREF
SHORTREF NONE
"&#RE;" "&#SPACE" "\'" "{" "}" "\{" "\}"
"0" "1" "2" "3" "4" "5" "6" "7"
"8" "9" "a" "b" "c" "d" "e" "f"
"\b" "\par" "\f" "\fs"
NAMES SGMLREF
QUANTITY SGMLREF
FEATURES
MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES
LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
OTHER CONCUR NO SUBDOC NO FORMAL YES
APPINFO NONE
>
<!DOCTYPE rtfdoc [
<!ELEMENT rtfdoc - O (rtf)>
<!ENTITY % command "b | par | f | fs">
<!ELEMENT rtf - - (rtf | %command; | rtfchar | #PCDATA)*>
<!ELEMENT (%command;) - O (#PCDATA)>
<!ELEMENT rtfchar - - (high, low)>
<!ELEMENT (high, low) - O EMPTY>
<!ATTLIST (high, low) value (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | A | B | C | D | E | F)
#REQUIRED>
<!SHORTREF rtfmap "\b" b "\par" par "\f" f "\fs" fs "\'" rtfchar
"\{" openbrc "\}" closebrc "{" s.rtf "}" e.rtf
"&#RE;" null>
<!SHORTREF cmdmap "\b" b "\par" par "\f" f "\fs" fs "\'" rtfchar
"\{" openbrc "\}" closebrc "{" s.rtf "}" e.rtf
"&#RE;" null "&#SPACE" e.tag>
<!ENTITY b STARTTAG "b">
<!ENTITY par STARTTAG "par">
<!ENTITY f STARTTAG "f">
<!ENTITY fs STARTTAG "fs">
<!ENTITY s.rtf STARTTAG "rtf">
<!ENTITY e.rtf ENDTAG "rtf">
<!ENTITY rtfchar STARTTAG "rtfchar">
<!ENTITY e.tag ENDTAG "">
<!ENTITY openbrc CDATA "{">
<!ENTITY closebrc CDATA "}">
<!ENTITY null "">
<!USEMAP rtfmap rtfdoc>
<!USEMAP cmdmap (%command;)>
<!SHORTREF high "0" high0 "1" high1 "2" high2 "3" high3
"4" high4 "5" high5 "6" high6 "7" high7
"8" high8 "9" high9 "a" highA "b" highB
"c" highC "d" highD "e" highE "f" highF>
<!ENTITY high0 "<high 0><!USEMAP low>">
<!ENTITY high1 "<high 1><!USEMAP low>">
<!ENTITY high2 "<high 2><!USEMAP low>">
<!ENTITY high3 "<high 3><!USEMAP low>">
<!ENTITY high4 "<high 4><!USEMAP low>">
<!ENTITY high5 "<high 5><!USEMAP low>">
<!ENTITY high6 "<high 6><!USEMAP low>">
<!ENTITY high7 "<high 7><!USEMAP low>">
<!ENTITY high8 "<high 8><!USEMAP low>">
<!ENTITY high9 "<high 9><!USEMAP low>">
<!ENTITY highA "<high A><!USEMAP low>">
<!ENTITY highB "<high B><!USEMAP low>">
<!ENTITY highC "<high C><!USEMAP low>">
<!ENTITY highD "<high D><!USEMAP low>">
<!ENTITY highE "<high E><!USEMAP low>">
<!ENTITY highF "<high F><!USEMAP low>">
<!USEMAP high rtfchar>
<!SHORTREF low "0" low0 "1" low1 "2" low2 "3" low3
"4" low4 "5" low5 "6" low6 "7" low7
"8" low8 "9" low9 "a" lowA "b" lowB
"c" lowC "d" lowD "e" lowE "f" lowF>
<!ENTITY low0 "<low 0></rtfchar>">
<!ENTITY low1 "<low 1></rtfchar>">
<!ENTITY low2 "<low 2></rtfchar>">
<!ENTITY low3 "<low 3></rtfchar>">
<!ENTITY low4 "<low 4></rtfchar>">
<!ENTITY low5 "<low 5></rtfchar>">
<!ENTITY low6 "<low 6></rtfchar>">
<!ENTITY low7 "<low 7></rtfchar>">
<!ENTITY low8 "<low 8></rtfchar>">
<!ENTITY low9 "<low 9></rtfchar>">
<!ENTITY lowA "<low A></rtfchar>">
<!ENTITY lowB "<low B></rtfchar>">
<!ENTITY lowC "<low C></rtfchar>">
<!ENTITY lowD "<low D></rtfchar>">
<!ENTITY lowE "<low E></rtfchar>">
<!ENTITY lowF "<low F></rtfchar>">
]>
{\b\f2\fs24 One paragraph.\par
Another paragraph with a {\b bold} wor
d, \{braces\}, and a special character
at the end.\'a5\par}
The instance following the DTD is SGML even though it does not use the familiar angle brackets "<" and ">". Most uses of SGML do not require the definition of markup languages this far from the norm. On the other hand, this is a good illustration that markup languages can be tailored to individual needs.
Following is the instance of the sample document in a more familiar form to illustrate the structure and information that it captured. Note that it is more clumsy than the RTF.
<rtfdoc>
<rtf><b></b><f>
2
</f><fs>
24
</fs>One paragraph.<par></par>Another paragraph with a <rtf><b></b>bold
</rtf> word, {braces}, and a special character at the end.<rtfchar>
<high value="A">
<low value="5">
</rtfchar><par></par></rtf>
</rtfdoc>
[First Annex] [Table of Contents] ©Copyright Exoterica Corporation, All rights reserved.