.Open eXtensible Markup Language . Introduction Somewhere .See acknowledgement1 between the complexity of $SGML and the rigidity of $HTML lies the eXtensible Markup Language($XML). $XML lets you describe the structure of a document. In return, you must have a Document Type Description($DTD) before you can process an $XML document properly. $XML documents can be $well_formed if they follow some simple rules that allow them to be parsed. These rules are outlined below. A $well_formed document can also be $valid if they match a Document Type Declaration($DTD). The DTD has to be declared at the start of the document along with things like the $XML version and the character code. There are many different DTDs for for different purposes. $XML documents can be processed if the processing is described in $XSL. To display an $XML document you need to supply some kind of mapping into a particular "style". Thus we now have style sheet languages : $XSL, $PSL, $P, ... See the W3C information .See http://www.w3.org/Style on style sheets and style sheet languages. The W3 Consortiuum support the Web and provide .See http://w3schools.com/ as a family of tools for learning the technology. . Examples of Document Types SimpleNovel::=http://www.megginson.com/texts/darkness/novel.dtd. (MathML): Structure of mathematical formula. .See http://www.w3.org/TR/REC-MathML with syntax .See http://www.w3.org/TR/REC-MathML/appendixE.html + $XML dsssl $stylesheets rtf tex jade .See http://www.nag.co.uk/projects/openmath/mml-files (OpenMath): .See http://www.openmath.org/ (DITA): Darwin Information Typing Architecture (DITA XML) .See http://xml.coverpages.org/dita.html (CML): Chemical Markup Language -- .See http://www.xml-cml.org/ (W3D): Replacement for the Virtual Reality Modeling Language. .See http://www.vrml.org/Specifications (HRMML): $XML based Human Resource Management Markup Language: .See http://www.structuredmethods.com/hrxml/main.html (DocBook): Structure of documentation for software documents. DocBook is (in 1999) actually an $SGML based way to document software. See .See http://www/dick/samples/comp.text.SGML.html#DocBook for more information. (XMI): XMI::="Presents meta-data for modeling objects", by CORBA and the Object Managment Group. .Hole The following needs an Id and password .See ftp://ftp.pmg.org/pub/docs/ad/98-1005.pdf XMI is also integrated with the Unified Modeling Language 1.3 standard .See http://www.csci.csusb.edu/dick/samples/uml.html (CBL):$XML Common Business Library .See http://www.veosystems.com/xml/cbl/cblfaq.html There are many more sample DTDs at .See http://xmltree.com/resources.html?keys=*5266 . XML examples .See http://www.xmltree.com/ (XMLRepository.com): .See http://xmlrepository.com/ . XML Links XML information by Dick Baldwin .See http://xml.about.com/ About.com is the new name for the organization that was previously known as The Mining Company. (XML Query Language - Frequently Asked Questions):xml .See http://metalab.unc.edu/xql/ (Human Resources Markup Language):xml .See http://www.hr-xml.org/channels/home.htm (MathML Files: DSSSL style sheet):xml .See http://www.nag.co.uk/projects/openmath/mml-files (Cover pages documentation):xml .See http://www.oasis-open.org/cover/xml.html (FAQ):xml .See http://www.ucc.ie/xml/ (XML-QL: A Query Language for XML):xml .See http://www.w3.org/TR/NOTE-xml-ql/ (XML and web services at DDJ):languages .See http://www.ddj.com/topics/xml/ . Well-Formed Documents First, XML is like $HTML however there are vital differences: .Box All the tags used in HTML are not defined in XML. You can add new tags to XML. XML is Case Sensitive In XML, WhiteSpace is significant XML is not about layout and look-and-feel. It is about structure and meaning. Five predefined entities: gt(>), lt(<), quot("), amp(&), apos('). End tags are never omitted. ... There is a special kind of tag which does not enclose some content <.../> Comments look like this Processing can be embedded Attributes always have a name and a value, and the value is between double quotes: name="value". .Close.Box . Syntax of Well Formed Documents Here is a simple description of all documents that might be in XML -- ignoring the context dependencies: XMLBNF::=following, .Net After a $prolog, comes a single entity called the root, and then some miscellaneous stuff that is probably meaningless: document::= $prolog $root $miscellaneous. prolog ::=xml_type #$comment dtd. A well formed document must start with a $prolog that identifies the version of XML it uses. For example .As_is is the current version of xml. The prolog should also identify the character code - especially if you need to use any non-"ASCII" characters. It can also identify some namespaces: .As_is xmlns="....". xml_namespace::lexeme="xmlns". root ::= $tagged_element | $empty_element. miscellaneous::= #(comment | processing | WS). WS::=`white space`. tagged_element::= "<" $tag #$attribute ">" $content "", -- the tag at the start and end must be the same. To be valid the tag must be defined in a $DTD and have attributes that and $content that match the rules in the $DTD. A tagged element contains other data -- between the two tags. .As_is War and Peace empty_element::="<" empty_tag #attribute "/>". .As_is singleton::= $empty_element. content::= #( $parsed_data | $element | $comment ), the valid sequences of pieces in a content are described by a regular expression form in the $DTD. An element is either a tagged element or a empty_element: (element)|- element==>$tagged_element | $empty_element. parsed_data::= #(char ~ ("<" | ">" | "&" | ";" | "'") | $entity ). entity::= $predefined_entity | $defined_entity. predefined_entity::=$gt | $lt | $quot | $amp | $apos, .Box gt::=">", stands for ">". lt::="<", stands for "<". quot::=""", stands for "\"". amp::="&", stands for "&". apos::="&apos", stands for "'". .Close.Box comment ::= "". .As_is attribute::= name "=" quoted_value. .As_is date="1999/06/22" .As_is time='11"11' quoted_value::=quotes value quotes | apostrophe value apostrophe. quotes::="\"". apostrophe::="'". defined_entity::=`defined in prolog`. parsed_data::=`defined in prolog`. tag::=`defined in prolog or namespace`, |- tag ==> $O( namespace ":") $name. name::=`defined in prolog`. value::=`defined in prolog`. .Close.Net XMLBNF The actual rule for quoting is a little more complex in that the quote character can not appear inside the value: quoted_value::= | [ q:quotes|apostrophe ] q #(char~q) q, or the union with q equal to quotes or apostrophe of.... . Validity To be valid the entities, tags and their attributes must match a set of rules given in a $DTD. Suppose that we specify a $DTD that has a set of normal tag names `T` and a set of content free (empty elements) with tag names `C` and for each tag `t:T|C` we must have attribute names N(`t`), and for each tag `t:T|C`, `q`:quotes|apostrophe, and attribute `a:N(t)`, we have a set of valid values `V(t,n, q)`, and `D` is the raw data in our document then define a(t,q)::= ![n:N(t), q](n= q V(t,n,q) q), a sequence of names with valid quoted values, and c(t, e)::=`an expression describing the valid content of tag t in terms of elements e`, and then an element of type t, is defined by e(t)::= ("<"t a(t)] ">" c(t, e) " | [t:C]( "<" t a(t) "/>"), and an element is the union over all tags element::= D | |[t:T](e(t)). .Box Note There is a trick above... the content expression `c(t,e)` depends on all the elements as a function associating tag names to elements of that type. It is probably best to think of this as an array or vector indexed by entity names. The resulting grammar is context dependent but can be formalized using only a small variation of context free grammars. The "data" (`D` above) can include elements that indicate some processing to be done to the data like this "". processing::= "". It is possible to name things (like files of data or strings) and use the names in place of the things -- but the rules are a little convoluted. .Close.Box . Document Type Declarations The `dtd` above is a document type declaration and has many forms. Here are some simple ones: dtd::= "". externalId::= ("PUBLIC" | "SYSTEM") $WS string_identifying_a_dtd_file. localdtd::= "[" #(markup_declaration| ... | $WS) "]" $O$WS. Local dtd are interpreted before external ones so that they can define terms used in the external ones. Unlike all other languages the first definition of a markup overrides the later ones. Thus localdtd's both over-ride and inform the external ones! The DOCTYPE defines the structure of the entity in the document for the document to be valid. markup_declaration::=element_declaration|entity_declaration|attribute_list_declaration | notation_declaration | process_indication | $WS. element_declaration::="". element_name::@name, the set of names occurring in element_declarations. attribute_list_declaration::="", attaches a set of attributes to the element named.. attribute_declaration::=attribute_name attribute_type attribute_default. attribute_name::@name, the set of names appearing in attribute declarations. type::= "CDATA" | "ENTITY" | "NMTOKEN" | "NMTOKENS" | "ID" | "IDREF" | "IDREFS". .Table type Syntax of Attribute values Semantics .Row CDATA $CDATA_section block of text .Row ENTITY $TBA Name of data .Row NMTOKEN $TBA $TBA .Row NMTOKENS $TBA $TBA .Row ID $identifier Can be used as an IDREF .Row IDREF $identifier refers to another ID .Row IDREFS many IDREFs $TBA .Close.Table attribute_default::= $required | $implied | $fixed | $default_value. default_value::=`literal data token`. required::="#REQUIRED", implies that the element must specify a value and so no default is needed. implied::="#IMPLIED", no default is given and no value has to be given. Note however if the attribute name is mentioned it must be assigned a value. fixed::="#FIXED" default_value, meaning that the default is also the only value and so cannot be changed in any occurrence. entity_declaration::="". entity_name::@name, the set of names occurring in entity_declarations. These add new entities. An entity is an abbreviation. Some (with the '%') are to be used in DTDs and are expanded there. They are written as "%"$entity_name";" and are replaced by the associated $entity_meaning as the DTD is elaborated. Others (with no "%") are ready to be used in actual XML document in form "&"$entity_name";". notation_declaration::="". CDATA_section::= "". pcdata::="#PCDATA", keyword indicating a block of parsed character data -- but no XML style marking up. identifier More $TBA. . Standards on the WWW W3C specifications .See http://www.w3.org/TR/1998/REC-xml-19980210 and Tim Brays Annotated Specification .See http://www.xml.com/axml/axml.html . Tools FOP::=http://www.jtauber.com/fop/, $XSL to PDF converter. XT::=http://www.jclark.com/xml/xt.html, processes $XSL transformations. .Open Parsers IBM .See http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCea89bdEb/ Apache XML Project's Xerces Java .See http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCfa89bdEb/ James Clark's XP .See http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCga89bdEb/ Microstar's Aefred .See http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCha89bdEb/ Sun's Java API for XML .See http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCja89bdEb/ Oracle's XML parser .See http://click.softwaredevelopment.email-publisher.com/maaac9gaaQhCka89bdEb/ .Close Parsers . Namespaces namespace_rules::=http://www.w3.org/TR/RECxml-names. Lars Marius Garshol wrote(comp.text.xml,13 May 1999) "The namespace URI does not point to anything meaningful, it's just a globally unique identifier. So your application will have to understand the DTD to make use of its elements. It would need that even if the URI did refer to a DTD. But at least they are now identified as being fitting elements, and your application can make a decision as to whether it should just ignore them or whether it should try to support them." . Glossary API::="Application Programmers Interface". DAML::=DARPA Agent Markup Language, .See http://www.daml.org/ DOM::="Documentation Object Model". DTD::="Document Type Declaration", .See http://www/dick/samples/comp.text.SGML.html#DTD ebXML::=Electronic Gusiness eXtensible Markup Language, .See http://www.ebxml.org/ HTML::$markup_language= $HTML_glossary & $HTML_syntax. HTML_glossary::=http://www/dick/samples/comp.html.glossary.html. HTML_syntax::=http://www/dick/samples/comp.html.syntax.html. language::="a set of syntactic and semantic rules defining the correct form, structure, and meaning of strings of characters", the chief product of computer science research. ML::="in an acronym often indicates a markup_language" | "a programming language". markup_language::$language="a language that describes how to mark up text to give it added meaning, richness, or layout and style". (optional): For x, O(x) ::= `an optional x`. OReilly_Books::=http://www.xml.com. P::$stylesheet_language, .See http://www.inrialpes.fr/opera/Thot the Thot structured document language and the P $stylesheet language. PSL::$stylesheet_language, part of the Proteus library and style sheet library. .See http://www.cs.uwm.edu/~multimedia RDF::="Resource Description Framework", .See http://www.w3.org/RDF/ RSS::="a lightweight multipurpose extensible metadata description and syndication format", "a Semantic Web vocabulary", .See http://web.resource.org/rss/1.0/ SAML::=Security Assertion Markup Language, .See http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=security SAX::="Simple $API for XML". Schema::=http://www.w3.org/XML/Schema, .See http:/w3/org/TR/xmlschema-0 SensorML::=Sensor Modeling Language, .See http://vast.uah.edu/SensorML/ SGML::$markup_language="Standard Generalized Markup Language", .See http://www/dick/samples/comp.text.SGML.html. .See http://xml.coverpages.org/sgml.html stylesheet::="A description in a special $stylesheet_language of the way a user or client wants some data interpreted and/or displayed". stylesheet_language::="A computer language defining how to specify the style for displaying or processing a document". SVG::=Scalable Vector Graphics, .See http://www.w3.org/TR/SVG/ SMIL::=Synchronized Multimedia Integration Language, .See http://www.w3.org/AudioVideo. SOAP::=http://www.w3.org/TR/soap, .See http://www.w3.or/TR/2003/REC-soap12-part0-20030624 TBA::="To Be Announced". VML::=Vector Markup Language, .See http://www.w3.org/TR/1998/NOTE-VML-19980513/ VoiceXML::=http://www.w3.org/TR/2003/CR-voicexml20-20030220/ .See http://www.voicexml.org WSDL::=Web Service Description Language, .See http://www.w3schools.com/wsdl/default.asp. .See http://www.w3.or.TR/wsdl XHTML::=http://www.w3.org/TR/xhtml1, Extensible HTML -- a version of HTML that follows the rules of XML. .See http://www.xhtml.org/ .See http://www.w3schools.com/xhtml/ XML::$markup_language="eXtensible Markup Language". .See http://www.w3.org/XML/ See the BNF syntax $XMLBNF above or the W3C specs .See http://www.w3.org/TR/1998/REC-xml-19980210/ or Tim Brays Annotated Specification .See http://www.xml.com/axml/axml.html or the Italian translation http://www.xml.it/REC-xml-19980210-it.html (OASIS): Organisation for the Advancement of Structured Information Systems. XML.org::=http://www.xml.org, XSL::$stylesheet_language="XML $stylesheet Language". .See http://www.w3.org/TR/REC-xml/ element::=`an identifiable(and so tagged) piece of data`. entity::=`a string that symbolizes a character` | `something that contains data`. . See Also The Annotated XML Spec at .See http://www.xml.com/axml/axml.html Mapping runtime objects into XML formatted data .See http://www.xml.com/xml/pub/Guide/XML_Serialization .See http://www.zeigermann.de/xtal.html .Close eXtensible Markup Language . acknowledgement1 Thanks to "Edward Szumski" . acknowledgement2 Larry Evans for correcting my many other errors. . acknowledgement3 Thanks to Mark Doernhoefer for his excellent series of articles "Surfing the Net for Software Engineering Notes", and in particular for the URLS in SIGSOFT V29n3(May 2004)pp15-24.