What Is XML?

What is XML?

By Mark Volkmann, OCI Principal Software Engineer

August 1999

XML, which stands for Extensible Markup Language, is a markup language that can be used to define custom "tags" similar to HTML tags. It is a subset of another markup language, Standardized General Markup Language (SGML), which is generally considered too complex for widespread use. 

One of the main goals of XML is to separate document content from its formatting. HTML combines content and formatting, resulting in documents that are difficult to use as sources of data for software applications. 

XML documents are human-readable, unlike databases and many file formats. This makes XML a very portable data format. 

XML documents can be easily processed by programming languages, such as Java and C++, for which XML parsers have been written. IBM, Sun, Microsoft, and others have XML parsers, which are currently freely available. 

The tags that are allowed in a particular XML document and their nesting relationships can be limited by Document Type Definition (DTD). Many industries are in the process of defining standard DTDs that will simplify data interchange between related applications. 

How are XML documents displayed?

XML documents can be displayed in web browsers.

If a browser supports XML, then XML documents can be displayed by referencing their URLs, just as is done for HTML documents. 

For browsers that do not support XML, an XML document can be translated to HTML on the web server. Java servlets can be used to do this.

How is the content of an XML document formatted?

Style sheets can be applied to XML documents to format them for output. 

There are two popular kinds of style sheets:

CSS is much simpler than XSL but lacks many of its features. CSS simply looks for specific XML tags and formats their content using a given set of HTML tags.

XSL has two parts: formatting and transformation.

XSL's formatting capabilities are the same as CSS. The transformation capability allows document content to be filtered and reordered through its own pattern-matching mechanism and scripting language.

Do I have to create my XML documents by hand?

The tool market for XML is still maturing, but there are some XML editors available now.

Many database vendors will be adding the capability to output the results of database queries in XML format. This will make it easy to display those results in a web browser. Style sheets can be used to format this output.

How is the data in an XML document used by software applications?

There are two common APIs that can be used by software applications to process data in an XML document: Simple API for XML (SAX) and Document Object Model (DOM).

SAX is an event-driven API. As it parses an XML document, it sends "events" to applications indicating the kinds of tags (or elements) it has encountered. Applications are written to perform specific actions in response to these events.

DOM is a data-driven API. As it parses an XML document, it builds a tree data structure describing the document content. This tree structure can then be traversed multiple times to extract data.

New elements can be added to the tree. Existing elements can be removed. The resulting tree can be output to create a brand new XML document. XML documents can even be created from scratch using DOM.

Who controls the standardization of XML and related technologies?

The World Wide Web Consortium (W3C) creates specifications for nearly all web-based technologies. While the W3C doesn't control the implementation of these technologies through licensing agreements, their recommendations are well respected. Most vendors follow their recommendations rather than risk having their products be perceived as non-standard.

Software Engineering Tech Trends (SETT) is a regular publication featuring emerging trends in software engineering.