Somewhere
[ acknowledgement1 ]
between the complexity of SGML and
the rigidity of HTML lies the eXtensible Markup Language(XML).
XML lets you describe the structure of a document. In return,
you must have a Document Type Description(DTD) before you can process
an XML document properly.
XML documents can be well_formed if they follow some
simple rules that allow them to be parsed. These rules
are outlined below.
A well_formed document can also be valid if they match a Document
Type Declaration(DTD). The DTD has to be declared at the start of
the document along with things like the XML version and the character
code. There are many different DTDs for for different purposes.
XML documents can be processed if the processing is described
in XSL. To display an XML document you need to supply some kind
of mapping into a particular "style". Thus we now have style sheet
languages : XSL, PSL, P, ...
See the W3C information
[ Style ]
on style sheets and style sheet languages.
The W3 Consortiuum support the Web and provide
[ http://w3schools.com/ ]
as a family of tools for learning the technology.
- SimpleNovel::= See http://www.megginson.com/texts/darkness/novel.dtd.
(MathML): Structure of mathematical formula.
[ REC-MathML ]
with syntax
[ appendixE.html ]
+ XML dsssl stylesheets rtf tex jade
[ mml-files ]
(OpenMath):
[ http://www.openmath.org/ ]
(DITA): Darwin Information Typing Architecture (DITA XML)
[ dita.html ]
(CML): Chemical Markup Language --
[ http://www.xml-cml.org/ ]
(W3D): Replacement for the Virtual Reality Modeling Language.
[ Specifications ]
(HRMML): XML based Human Resource Management Markup Language:
[ main.html ]
(DocBook): Structure of documentation for software documents. DocBook
is (in 1999) actually an SGML based way to document software.
See
[ DocBook in comp.text.SGML ]
for more information.
(XMI): XMI::="Presents meta-data for modeling objects", by CORBA and
the Object Managment Group.
[click here
if you can fill this hole]
The following needs an Id and password
//ftp.pmg.org/pub/docs/ad/98-1005.pdf
XMI is also integrated with the Unified Modeling Language 1.3
standard
[ uml.html ]
(CBL): XML Common Business Library
[ cblfaq.html ]
There are many more sample DTDs at
[ resources.html?keys=*5266 ]
[ http://www.xmltree.com/ ]
(XMLRepository.com):
[ http://xmlrepository.com/ ]
XML information by Dick Baldwin
[ http://xml.about.com/ ]
About.com is the new name for the organization that was previously known as
The Mining Company.
(XML Query Language - Frequently Asked Questions): xml
[ http://metalab.unc.edu/xql/ ]
(Human Resources Markup Language): xml
[ home.htm ]
(MathML Files: DSSSL style sheet): xml
[ mml-files ]
(Cover pages documentation): xml
[ xml.html ]
(XML FAQ): xml
[ http://www.ucc.ie/xml/ ]
(XML-QL: A Query Language for XML): xml
[ http://www.w3.org/TR/NOTE-xml-ql/ ]
(XML and web services at DDJ): languages
[ http://www.ddj.com/topics/xml/ ]
First, XML is like HTML however there are vital differences:
- All the tags used in HTML are not defined in XML.
- You can add new tags to XML.
- XML is Case Sensitive
- In XML, WhiteSpace is significant
- XML is not about layout and look-and-feel. It is about structure and meaning.
- Five predefined entities: gt(>), lt(<), quot("), amp(&), apos(').
- End tags are never omitted. <t....> ... </t>
- There is a special kind of tag which does not enclose some content <.../>
- Comments look like this <!-- ..... -->
- Processing can be embedded <?....?>
- Attributes always have a name and a value, and the value is between double quotes: name="value".
Here is a simple description of all documents that might be in XML -- ignoring
the context dependencies:
- XMLBNF::=following,
Net
After a prolog, comes a single
entity called the root, and then some miscellaneous stuff that
is probably meaningless:
- document::= prolog root miscellaneous.
- prolog::=xml_type #comment dtd.
A well formed document must start with a prolog that
identifies the version of
XML it uses. For example
<?xml version="1.0"?>
is the current version of xml.
The prolog
should also identify the character code - especially if you need to use
any non-"ASCII" characters. It can also identify some namespaces:
xmlns="....".
- xml_namespace::lexeme=
"xmlns".
- root::= tagged_element | empty_element.
- miscellaneous::= #(comment | processing | WS).
- WS::=white space.
- tagged_element::= "<" tag #attribute ">" content "</" tag ">", -- the
tag at the start and end must be the same. To be valid the tag must be
defined in a DTD and have attributes that and content that match
the rules in the DTD. A tagged element contains other data -- between the two tags.
<title>War and Peace</title>
- empty_element::="<" empty_tag #attribute "/>".
<timestamp date="1999/06/22" time="11:00"/>
- singleton::= empty_element.
- content::= #( parsed_data | element | comment ),
the valid sequences of pieces in a content are described by a regular
expression form in the DTD. An element is either a
tagged element or a empty_element:
- (element)|-element==>tagged_element | empty_element.
- parsed_data::= #(char ~ ("<" | ">" | "&" | ";" | "'") | entity ).
- entity::= predefined_entity | defined_entity.
- predefined_entity::=gt | lt | quot | amp | apos,
- gt::=">", stands for ">".
- lt::="<", stands for "<".
- quot::=""", stands for "\"".
- amp::="&", stands for "&".
- apos::="&apos", stands for "'".
- comment::= "<!--" ... "-->".
<!-- this is a comment -->
- attribute::= name "=" quoted_value.
date="1999/06/22"
time='11"11'
- quoted_value::=quotes value quotes | apostrophe value apostrophe.
- quotes::="\"".
- apostrophe::="'".
- defined_entity::=defined in prolog.
- parsed_data::=defined in prolog.
- tag::=defined in prolog or namespace,
- |-tag ==> O( namespace ":") name.
- name::=defined in prolog.
- value::=defined in prolog.
(End of Net
XMLBNF)
The actual rule for quoting is a little more complex in that the quote
character can not appear inside the value:
- quoted_value::= | [ q:quotes|apostrophe ] q #(char~q) q,
or the union with q equal to quotes or apostrophe of....
To be valid the entities, tags and their attributes must match a set of rules
given in a DTD.
Suppose that we specify a DTD that has a set of normal tag names T
and a set of content free (empty elements) with tag names C
and for each tag t:T|C we must have attribute names N(t),
and for each tag t:T|C, q:quotes|apostrophe, and attribute a:N(t), we have a set of valid
values V(t,n, q), and D is the raw data in our document
then define
- a(t,q)::=  q), a sequence of names with valid quoted values,
and
- c(t, e)::=an expression describing the valid content of tag t in terms of elements e,
and then an element of type t, is defined by
- e(t)::= ("<"t a(t)] ">" c(t, e) "</" t"> | [t:C]( "<" t a(t) "/>"),
and an element is the union over all tags
- element::= D | |[t:T](e(t)).
Note
-
There is a trick above... the content expression c(t,e) depends on
all the elements as a function associating tag names to elements of
that type. It is probably best to think of this as an array
or vector indexed by entity names. The resulting grammar
is context dependent but can be formalized using only a small variation of
context free grammars.
The "data" (D above) can include elements that indicate some processing to be
done to the data like this "<?.....?>".
- processing::= "<?" tag parameters "?>".
It is possible to name things (like files of data or strings) and
use the names in place of the things -- but the rules are a little
convoluted.
The dtd above is a document type declaration and has many forms.
Here are some simple ones:
- dtd::= "<!DOCTYPE " WS name O(WS externalId) OWS O( localdtd ) ">".
- externalId::= ("PUBLIC" | "SYSTEM") WS string_identifying_a_dtd_file.
- localdtd::= "[" #(markup_declaration| ... | WS) "]" OWS.
Local dtd are interpreted before external ones so that they can define
terms used in the external ones. Unlike all other languages the
first definition of a markup overrides the later ones. Thus localdtd's
both over-ride and inform the external ones!
The DOCTYPE defines the structure of the entity in the document
for the document to be valid.
- markup_declaration::=element_declaration|entity_declaration|attribute_list_declaration | notation_declaration | process_indication | WS.
- element_declaration::="<!ELEMENT" element_name type_description ">".
- element_name::@name, the set of names occurring in element_declarations.
- attribute_list_declaration::="<!ATTLIST" element_name #attribute_declaration ">", attaches a set of attributes to the element named..
- attribute_declaration::=attribute_name attribute_type attribute_default.
- attribute_name::@name, the set of names appearing in attribute declarations.
- type::= "CDATA" | "ENTITY" | "NMTOKEN" | "NMTOKENS" | "ID" | "IDREF" | "IDREFS".
Table
(Close Table)
- attribute_default::= required | implied | fixed | default_value.
- default_value::=literal data token.
- required::="#REQUIRED", implies that the element must specify a value and so no default is needed.
- implied::="#IMPLIED", no default is given and no value has to be given. Note however if the attribute name is mentioned it must be assigned a value.
- fixed::="#FIXED" default_value, meaning that the default is also the only value and so cannot be changed in any occurrence.
- entity_declaration::="<!ENTITY" O("%") entity_name entity_meaning ">".
- entity_name::@name, the set of names occurring in entity_declarations.
These add new entities. An entity is an abbreviation. Some (with the '%')
are to be used in DTDs and are expanded there. They are written as
"%"entity_name";" and are replaced by the associated entity_meaning as the
DTD is elaborated. Others (with no "%") are ready to be
used in actual XML document in form "&"entity_name";".
- notation_declaration::="<!NOTATION" TBA ">".
- CDATA_section::= "<![CDATA[" TBA "]]>".
- pcdata::="#PCDATA", keyword indicating a block of parsed character data -- but no XML style marking up.
- identifier
More TBA.
W3C specifications
[ REC-xml-19980210 ]
and Tim Brays Annotated Specification
[ axml.html ]
- FOP::= See http://www.jtauber.com/fop/,
XSL to PDF converter.
- XT::= See http://www.jclark.com/xml/xt.html,
processes XSL transformations.
- IBM
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCea89bdEb/ ]
- Apache XML Project's Xerces Java
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCfa89bdEb/ ]
- James Clark's XP
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCga89bdEb/ ]
- Microstar's Aefred
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCha89bdEb/ ]
- Sun's Java API for XML
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCja89bdEb/ ]
- Oracle's XML parser
[ http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCka89bdEb/ ]
. . . . . . . . . ( end of section Parsers) <<Contents | End>>
- namespace_rules::= See http://www.w3.org/TR/RECxml-names.
Lars Marius Garshol <larsga@ifi.uio.no> wrote(comp.text.xml,13 May 1999)
"The namespace URI does not point
to anything meaningful, it's just a globally unique identifier. So your application will have to understand the DTD to make use of its elements. It would need that even if the URI did refer to a
DTD. But at least they are now identified as being fitting elements,
and your application can make a decision as to whether it should just
ignore them or whether it should try to support them."
- API::="Application Programmers Interface".
- DAML::=DARPA Agent Markup Language,
[ http://www.daml.org/ ]
- DOM::="Documentation Object Model".
- DTD::="Document Type Declaration",
[ DTD in comp.text.SGML ]
- ebXML::=Electronic Gusiness eXtensible Markup Language,
[ http://www.ebxml.org/ ]
- HTML::markup_language= HTML_glossary & HTML_syntax.
- HTML_glossary::= See http://cse.csusb.edu/dick/samples/comp.html.glossary.html.
- HTML_syntax::= See http://cse.csusb.edu/dick/samples/comp.html.syntax.html.
- language::="a set of syntactic and semantic rules defining the correct form, structure, and meaning of strings of characters", the chief product of computer science research.
- ML::="in an acronym often indicates a markup_language" | "a programming language".
- markup_language::language="a language that describes how to mark up text to give it added meaning, richness, or layout and style".
(optional):
- For x, O(x)::= an optional x.
- OReilly_Books::= See http://www.xml.com.
- P::stylesheet_language,
[ Thot ]
the Thot structured document language and the P stylesheet language.
- PSL::stylesheet_language, part of the Proteus library and style sheet library.
[ ~multimedia ]
- RDF::="Resource Description Framework",
[ http://www.w3.org/RDF/ ]
- RSS::="a lightweight multipurpose extensible metadata description and syndication format", "a Semantic Web vocabulary",
[ http://web.resource.org/rss/1.0/ ]
- SAML::=Security Assertion Markup Language,
[ tc_home.php?wg_abbrev=security ]
- SAX::="Simple API for XML".
- Schema::= See http://www.w3.org/XML/Schema,
[/w3/org/TR/xmlschema-0]
- SensorML::=Sensor Modeling Language,
[ http://vast.uah.edu/SensorML/ ]
- SGML::markup_language="Standard Generalized Markup Language",
[ comp.text.SGML.html] .
[ sgml.html ]
- stylesheet::="A description in a special stylesheet_language of the way a user or client wants some data interpreted and/or displayed".
- stylesheet_language::="A computer language defining how to specify the style for displaying or processing a document".
- SVG::=Scalable Vector Graphics,
[ http://www.w3.org/TR/SVG/ ]
- SMIL::=Synchronized Multimedia Integration Language,
[ AudioVideo] .
- SOAP::= See http://www.w3.org/TR/soap,
[ REC-soap12-part0-20030624 ]
- TBA::="To Be Announced".
- VML::=Vector Markup Language,
[ http://www.w3.org/TR/1998/NOTE-VML-19980513/ ]
- VoiceXML::= See http://www.w3.org/TR/2003/CR-voicexml20-20030220/
[ http://www.voicexml.org ]
- WSDL::=Web Service Description Language,
[ default.asp] .
[ wsdl ]
- XHTML::= See http://www.w3.org/TR/xhtml1,
Extensible HTML -- a version of HTML that follows the
rules of XML.
[ http://www.xhtml.org/ ]
[ html_xhtml.asp ]
(thanks to Eric October 19th 2012 for this link).
- XML::markup_language="eXtensible Markup Language".
[ http://www.w3.org/XML/ ]
See the BNF syntax XMLBNF above
or the W3C specs
[ http://www.w3.org/TR/1998/REC-xml-19980210/ ]
or Tim Brays Annotated Specification
[ axml.html ]
or the Italian translation
http://www.xml.it/REC-xml-19980210-it.html
(OASIS): Organisation for the Advancement of Structured Information Systems.
- XML.org::= See http://www.xml.org,
- XSL::stylesheet_language="XML stylesheet Language".
[ http://www.w3.org/TR/REC-xml/ ]
- element::=an identifiable(and so tagged) piece of data.
- entity::=a string that symbolizes a character | something that contains data.
The Annotated XML Spec at
[ axml.html ]
Mapping runtime objects into XML formatted data
[ XML_Serialization ]
[ xtal.html ]