<<Up     Contents

XML

Redirected from Extensible markup language

XML (eXtensible Markup Language) is a standard maintained by the World Wide Web Consortium for creating special-purpose markup languages. It is general enough that XML-based languages can be used to describe a number of different kinds of data as well as text. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Although based on SGML, it is greatly simplified, despite including enhancements for portability. Languages based on XML (for example, RDF, SMIL, MathML, and SVG) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge.

Table of contents

Strengths and Weaknesses

The features of the XML format that make it particularly appropriate for data transfer are :

The weaknesses of the format relates to matters of efficiency, since the XML

For matters of generic, loosely-bound, data transfer the strengths outweigh weaknesses, and in many neutral applications where efficiency is not a particular concern an XML format is also coming to be adopted simply because tools to manipulate XML are now conveniently on-hand.

Syntax rules for an XML file

It should be noted that XML files themselves are simple text files. The encoding is specified in the first statement. The default encoding is UTF-8 which includes, but is not limited to, ASCII.

Unlike, for example, HTML, XML is highly dependent upon structure, content and integrity for its efficacy. In order for a document to be considered "well-formed", i.e. fully XML-compliant, an XML file must conform (at the very least) to the following:

It should be noted that elements in an XML context are case-sensitive: for example <Example> and </Example> are a well-formed matching pair whereas <Example> and </example> are not.

Also, again unlike HTML, XML tags explain what the data means rather than how simply to display it.

As a concrete example, a simple recipe expressed in an XML representation might be:

        <Recipe name="bread" prep_time="5 mins" cook_time="3 hours" >
            <ingredient amount="3 cups" >Flour</ingredient>
            <ingredient amount="0.25 ounce" >Yeast</ingredient>
            <ingredient amount="1 1/4 cups" >Warm Water</ingredient>
            <ingredient amount="1 teaspoon" >Salt</ingredient>
            <Instructions>Mix all ingredients together, and knead thoroughly.
                 Cover with a cloth, and leave for one hour in warm room.
                 Knead again, place in a tin, and then bake in the oven.
            </Instructions>
        </Recipe>

Document Type Definition

Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special purpose parsers and writers. For a language based on XML, however, the software designer can specify the basic syntax by writing a DTD, or a more detailed description using an XML Schema. There are readily available (and in some cases free) tools which understand these descriptions -- XML parsers and writers (http://www.w3.org/XML/#software). This may significantly reduce life-cycle development cost.

When an XML file is both compliant with the rules for well-formedness and is also in concordance with the DTD or Schema which it refers to, then the XML file is considered a "valid document".

Displaying XML files on the web

As a further adjunct to XML is the stylesheet language XSL, which allows users to describe visual properties and transformations of XML data without embedding those instructions into the data itself. The resulting file is then an HTML file which uses CSS for rendering.

An XML file may also be rendered directly in some browsers such as e.g. Internet Explorer 5 or Mozilla with the stylesheet language CSS. This process is still not yet stable as of January 2003. The XML files must then include a reference to a style sheet:

 <?xml-stylesheet type="text/css" href="myStyleSheet.css"?>

While browser-based XML rendering develops, the alternative is conversion into HTML or PDF or other formats on the server. Programs like [Cocoon (http://xml.apache.org/cocoon/index.html)] process an XML file against a stylesheet (and can perform other processing as well) and send the output back to the user's browser without the user needing to be aware of what has been going on in the background.

Components of XML

It is not compatible with DTDs (Schemas must be used).

Processing XML files

The APIs widely used in processing XML data by programming languages are Simple API for XML and DOM[?]. SAX is used for serial processing whereas DOM is used for random access processing.

An XSL processor may be used to render an XML file for displaying or printing. XSL itself is intended for creating PDF files. XSLT is for transforming to other formats, including HTML, other vocabularies of XML, and any other plain-text format.

The native file format of OpenOffice.org is XML. Some parts of Microsoft Office-11 will also be able to edit XML files with a user-supplied Schema (but not a DTD). There are dozens of other XML editors available.

Versions of XML

The first version of XML was XML 1.0.

The latest official version of XML is 1.1. XML 1.1 (also known as Blueberry) extends XML 1.0 by adding support for new characters in Unicode 3.0, and by fixing an oversight which led to XML not supporting EBCDIC end of line conventions.

There are also discussions on an XML 2.0, although it remains to be seen if such will ever come about. XML-SW (SW for skunk works), authored by one of the original developers of XML, contains some proposals for what an XML 2.0 might look like: elimination of DTDs from syntax, integration of Namespaces, XML Base and XML Information Set into the base standard.

See also

External links:

Anti-XML links:

wikipedia.org dumped 2003-03-17 with terodump