Writing API for XML (WAX)

Writing API for XML (WAX)

By R. Mark Volkmann, OCI Partner

September 2008


Contents

Introduction  ·  WAX Tutorial  ·  WAX Limitations  ·  WAX Details  ·  Approaches Compared  ·  Simple API for XML (SAX)  ·  Document Object Model (DOM) JDOM  ·  Groovy  ·  XMLStreamWriter  ·  Conclusion

A short video introduction ...

Introduction

What's the best way to read a large XML document? Of course you'd use a SAX parser or a pull parser. What's the best way to write a large XML document? Building a DOM structure to describe a large XML document won't work because it won't fit in memory. Even if it did, it's not a simple API to use. There hasn't been a solution that is simple and memory efficient until now.

Writing API for XML (WAX) is a free, open-source library for writing XML documents. I created it because I got an OutOfMemoryError while trying to output a large XML document from an application I wrote using JDOM, another Java-based XML library. I searched for other libraries that could write large XML documents but couldn't find any that were as simple to use as I thought they should be.

WAX is released under the LGPL with the intention of making its use unencumbered. It is well-tested and ready for production use. The WAX home page is at http://java.ociweb.com/mark/programming/WAX.html. Java and Ruby versions are available now. The Java version of WAX can be downloaded from Google Code at http://code.google.com/p/waxy/. The Ruby version of WAX can be downloaded from RubyForge at http://rubyforge.org/projects/waxy/ and can be installed by running "gem install wax". Ports for other programming languages will follow.

WAX has the following characteristics:

WAX Tutorial

This section provides many examples of using WAX. Each code snippet is followed by the output it produces.

When the no-arg WAX constructor is used, XML is written to standard output. There are also WAX constructors that take a java.io.OutputStream or ajava.io.Writer object.

Here's a simple example where only a root element is written:

WAX wax = new WAX();
wax.start("car").close();
<car></car>

After a WAX object is closed, a new one must be created in order to write more XML. In the examples that follow, assume that has been done.

Let's write a root element with some text inside:

wax.start("car").text("Prius").end().close();
<car>Prius</car>

The end method terminates the element that is started by the start method. In this case it's not necessary to call end because the close method terminates all unterminated elements.

Let's put the text inside a child element:

wax.start("car").start("model").text("Prius").close();
<car>
  <model>Prius</model>
</car>

Let's do the same with the child convenience method: which is equivalent to calling start,text and end.

wax.start("car").child("model", "Prius").close();
<car>
  <model>Prius</model>
</car>

Let's put text containing all the special XML characters in a CDATA section:

wax.start("car").start("model").cdata("1<2>3&4'5\";6").close();
 
<car>
  <model>
    <![CDATA[1<2>3&4'5"6]]>
  </model>
</car>

Let's output the XML without indentation, on a single line:

wax.setIndent(null);
wax.start("car").child("model", "Prius").close();
 
<car><model>Prius</model></car>

Let's indent the XML with four spaces instead of the default of two:

wax.setIndent("    "); // can also call setIndent(4)
wax.start("car").child("model", "Prius").close();
<car>
    <model>Prius</model>
</car>

Let's add an attribute:

wax.start("car").attr("year", 2008).child("model", "Prius").close();
 
<car year="2008">
  <model>Prius</model>
</car>

Attributes must be specified before any content for their element is specified. For example, callingstartattr and text is valid, but calling starttext and attr is not. If this rule is violated then an IllegalStateException is thrown.

Let's add an XML declaration:

WAX wax = new WAX(WAX.Version.V1_0);
wax.start("car").attr("year", 2008)
   .child("model", "Prius").close();
 
<?xml version="1.0" encoding="UTF-8"?>
<car year="2008">
  <model>Prius</model>
</car>

Let's add a comment:

wax.comment("This is a hybrid car.")
   .start("car").child("model", "Prius").close();
 
<!-- This is a hybrid car. -->
<car>
  <model>Prius</model>
</car>

Let's add a processing instruction:

wax.processingInstruction("target", "data")
   .start("car").attr("year", 2008)
   .child("model", "Prius").close();
<?target data?>
<car year="2008">
  <model>Prius</model>
</car>

Let's associate an XSLT stylesheet with the XML: The xslt method is a convenience method for adding this commonly used processing instruction.

wax.xslt("car.xslt")
   .start("car").attr("year", 2008)
   .child("model", "Prius").close();
<?xml-stylesheet type="text/xsl" href="car.xslt"?>
<car year="2008">
  <model>Prius</model>
</car>

Let's associate a default namespace with the XML:

wax.start("car").attr("year", 2008)
   .namespace("http://www.ociweb.com/cars")
   .child("model", "Prius").close();
<car year="2008"
  xmlns="http://www.ociweb.com/cars">
  <model>Prius</model>
</car>

Let's associate a non-default namespace with the XML:

String prefix = "c";
wax.start(prefix, "car").attr("year", 2008)
   .namespace(prefix, "http://www.ociweb.com/cars")
   .child(prefix, "model", "Prius").close();
<c:car year="2008"
  xmlns:c="http://www.ociweb.com/cars">
  <c:model>Prius</c:model>
</c:car>

Like attributes, namespaces must be specified before any content for their element is specified. If this rule is violated then an IllegalStateException is thrown.

Let's associate an XML Schema with the XML:

wax.start("car").attr("year", 2008)
   .namespace(null, "http://www.ociweb.com/cars", "car.xsd")
   .child("model", "Prius").close();
<car year="2008"
  xmlns="http://www.ociweb.com/cars"
  xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
  xsi:schemaLocation="http://www.ociweb.com/cars car.xsd">
  <model>Prius</model>
</car>

Let's associate multiple XML Schemas with the XML:

wax.start("car").attr("year", 2008)
   .namespace(null, "http://www.ociweb.com/cars", "car.xsd")
   .namespace("m", "http://www.ociweb.com/model", "model.xsd")
   .child("m", "model", "Prius").close();
<car year="2008"
  xmlns="http://www.ociweb.com/cars"
  xmlns:m="http://www.ociweb.com/model"
  xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
  xsi:schemaLocation="http://www.ociweb.com/cars car.xsd
    http://www.ociweb.com/model model.xsd">
  <m:model>Prius</m:model>
</car>

Let's associate a DTD with the XML:

wax.dtd("car.dtd")
   .start("car").attr("year", 2008)
   .child("model", "Prius").close();
<!DOCTYPE car SYSTEM "car.dtd">
<car year="2008">
  <model>Prius</model>
</car>

Let's add and use entity definitions:

String url = "http://www.ociweb.com/xml/";
wax.entityDef("oci", "Object Computing, Inc.")
   .externalEntityDef("moreData", url + "moreData.xml")
   .start("root")
   .text("The author works at &oci; in St. Louis, Missouri.",
       true, false) // turning escaping off for entity reference
    .text("@moreData;", true, false)
   .close();
<!DOCTYPE root [
  <!ENTITY oci "Object Computing, Inc.">
  <!ENTITY moreData SYSTEM "http://www.ociweb.com/xml/moreData.xml">
]>
<root>
  The author works at &oci; in St. Louis, Missouri.
  &moreData;
</root>

common usage pattern is to pass a WAX object to a method of model objects that use it to write their XML representation. For example, a Car class could have the following method.

public void toXML(WAX wax) {
    wax.start("car")
       .attr("year", year)
       .child("make", make)
       .child("model", model)
       .end();
}
 

An example of the XML this would produce follows:

<car year="2008">
  <make>Toyota</make>
  <model>Prius</model>
</car>

A Person class whose objects hold a reference to an Address object could have the following method.

public void toXML(WAX wax) {
    wax.start("person")
       .attr("birthdate", birthdate)
       .child("name", name);
    address.toXML(wax);
    wax.end();
}

The Address class could have the following method.

public void toXML(WAX wax) {
    wax.start("address")
       .child("street", street);
       .child("city", city);
       .child("state", state);
       .child("zip", zip);
       .end();
}

An example of the XML this would produce follows:

 
<person birthdate="4/16/1961">
  <name>R. Mark Volkmann</name>
  <address>
    <street>123 Some Street</street>
    <city>Some City</city>
    <state>MO</state>
    <zip>12345</zip>
  </address>
</person>
 

WAX Limitations

WAX only helps with writing XML, not reading it. To read large XML documents, use a SAX parser (such as Xerces that comes with Java) or a pull parser (such as Woodstox) which is my preference.

WAX doesn't verify that the XML it outputs is valid according to some schema.

WAX shines when you need to output arbitrary XML that doesn't necessarily map cleanly to objects from existing Java classes. However, there are even simpler approaches if you are serializing Java objects into XML and will later want to deserialize the XML back to Java objects. My favorite of these is XStream. Another option is JAXB.

WAX Details

So what does WAX actually do?

WAX writes out bits of XML as calls are made. It doesn't buffer up the data in a data structure to be written out later, as is done in the DOM approach. Actually it does do this for five cases, none of which involve a large amount of data.

  1. Entity definitions, specified before the root element, are held in a list and written out in a DOCTYPE just before the root element start tag is output. Once this is done, the list is cleared.
  2. Associations between namespace URIs and XML Schema paths, specified using the namespace method, are held in a map. This information is needed to construct the value of the xsi:schemaLocation attribute. After each start tag is completed, the map is cleared.
  3. The names of unterminated ancestor elements are held in a stack. This is needed so they can be properly terminated when the end is invoked. This pops the name off the stack. The close method calls end for each name remaining on this stack in order to terminate all unterminated elements.
  4. The namespace prefixes that are defined for each element are held in a stack. As each element is terminated, an entry is popped off this stack. This is used to verify that all namespace prefixes used on elements and attributes are in scope.
  5. All namespace prefixes used on the current element or its attributes are held in a list. When the start tag for the current element is closed, all the prefixes in this list are checked to verify that a matching namespace declaration is in scope. This is necessary because a namespace can be defined on the same element that uses the prefix for itself and/or its attributes. After the prefixes are verified, the list is cleared.

It is not possible for WAX to output XML that isn't well-formed without an exception being thrown, unless you forget to call the close method when finished. If an exception is thrown then the tags already output may not be terminated. All exceptions thrown by WAX are runtime exceptions. IOExceptions are wrapped by RuntimeException.

WAX keeps track of the current state of the document in order to provide extensive error checking. There are four states:

  1. IN_PROLOG - The start tag for the root element hasn't been output yet.
  2. IN_START_TAG - The start tag of the current element has been written, but the > or /> at the end hasn't been written yet so attributes and namespace declarations can still be added.
  3. AFTER_START_TAG - A > has been written at the end the start tag for the current element so it's ready for content.
  4. AFTER_ROOT - The root element has been terminated. Only comments and processing instructions can be output now.

WAX uses the current state to determine whether specific method calls are valid. For example, if the state is IN_PROLOG, it doesn't make sense to call the attr method. That adds an attribute to an element, but you haven't written any elements yet if you're still in the prolog section of the XML document.

When the state is IN_START_TAG, many methods trigger termination of the start tag. These include: cdatachildclosecommentendnltextpistart and text. This happens because none of these things can be written inside a start tag. Methods that do not cause a start tag to be terminated include: attr and namespace because these are things that belong in a start tag.

WAX remembers the namespace declarations that are in-scope and verifies that only in-scope namespace prefixes are used on elements and attributes.

The close method terminates all unterminated elements and closes the stream to which the XML is being written. This is done so subsequent code can't write additional content that would result in XML that isn't well-formed.

All methods that write a part of the XML output return the WAX object on which they are invoked to support method chaining. Methods that configure WAX, including setIndent and setTrustMe, do not. When method chaining is used, compile-time type checking verifies that each successive call is valid in the context of the previous call. For example, it's not valid to call attr immediately after calling text. This is accomplished through a novel approach suggested by Brian Gilstrap at OCIWAX methods that return the WAX object return it as one of many interface types that are implemented by the WAX class rather than the WAX class type. The interface returned describes only the WAX methods that are valid to invoke next. Note that this allows IDEs to flag invalid method chaining call sequences as code is entered. There is a downside to method chaining. If a method in the chain throws an exception, it may not be apparent which one threw it since the chain could invoke the same method multiple times.

The following UML diagram conveys all the details behind this. Note the interface types that are implemented by the WAX class. Most WAX methods specify one of these interfaces as their return type.

Wax UML Class Diagram

Approaches Compared

In the next several sections, I'll compare several approaches for writing XML. For each approach, I'll produce the following XML, referred to as the "target XML." Note that it includes both a DOCTYPE (associating a DTD with the XML) and a schemaLocation attribute (associating an XML Schema with the XML). It's not normal to do both, but I want to demonstrate how both are accomplished.

<?xml version="1.0" encoding="UTF-8"?>
<!-- This is one of my favorite CDs! -->
<?xml-stylesheet type="text/xsl" href="cd.xslt"?>
<!DOCTYPE cd SYSTEM "http://www.ociweb.com/xml/cd.dtd">
<cd year="2008"
  xmlns="http://www.ociweb.com/music"
  xmlns:date="http://www.ociweb.com/date"
  xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
  xsi:schemaLocation="http://www.ociweb.com/music http://www.ociweb.com/xml/cd.xsd
    http://www.ociweb.com/date http://www.ociweb.com/xml/date.xsd">
  <artist name="Gardot, Melody">
    <title>Worrisome Heart</title>
    <date:purchaseDate>4/3/2008</date:purchaseDate>
  </artist>
</cd>

Here's the WAX code that produces the example XML above.

import com.ociweb.xml.WAX;
 
public class CDDemo {
 
    public static void main(String[] args) {
        // Write to System.out with an XML declaration that specifies version 1.0.
        // If the version is omitted then no XML declaration will be written.
        WAX wax = new WAX(WAX.Version.V1_0);
 
        wax.comment("This is one of my favorite CDs!")
           .xslt("cd.xslt")
           .dtd("cd", "http://www.ociweb.com/xml/cd.dtd")
 
           .start("cd")
           .attr("year", 2008)
           // null signifies the default namespace
           .namespace(null, "http://www.ociweb.com/music",
               "http://www.ociweb.com/xml/cd.xsd")
           .namespace("date", "http://www.ociweb.com/date",
               "http://www.ociweb.com/xml/date.xsd")
 
           .start("artist")
           .attr("name", "Gardot, Melody")
           .child("title", "Worrisome Heart")
           .child("date", "purchaseDate", "4/3/2008")
 
           .close(); // terminates all unterminated elements
    }
}

This is much more compact and understandable than any other approach I have seen.

Simple API for XML (SAX)

Readers familiar with SAX, the Simple API for XML, may have noticed a similarity between WAX methods and those in the SAX ContentHandler interface (processingInstruction,startElement, characters, endElement, startPrefixMapping and endPrefixMapping).

SAX is normally used for reading XML. However, it can also be used to write XML when it is used in conjunction with the Transformation API for XML (TrAX). TrAX is supported out of the box with Java in the javax.xml.transform package. Unfortunately, writing XML with SAX is a bit complicated. First, you need to write a class that acts as a custom org.xml.sax.XMLReader . That's an interface with a lot of methods to be implemented. This can be simplified by instead writing a class that extends org.xml.sax.helpers.XMLFilterImpl . The example that follows uses many private convenience methods that greatly simplify the code in the parse method.

  1. import java.util.HashMap;
  2. import java.util.Map;
  3. import org.xml.sax.*;
  4. import org.xml.sax.helpers.AttributesImpl;
  5.  
  6. public class CustomXMLReader extends org.xml.sax.helpers.XMLFilterImpl {
  7.  
  8. private AttributesImpl attrs = new AttributesImpl();
  9. private ContentHandler contentHandler;
  10. private Map<String, String> prefixToURIMap = new HashMap<String, String>();
  11.  
  12. @Override
  13. public void setContentHandler(ContentHandler contentHandler) {
  14. this.contentHandler = contentHandler;
  15. }
  16.  
  17. @Override
  18. public void parse(InputSource input) throws SAXException {
  19. contentHandler.startDocument();
  20. contentHandler.processingInstruction(
  21. "xml-stylesheet", "type=\"text/xsl\" href=\"cd.xslt\"");
  22.  
  23. String musicURI = "http://www.ociweb.com/music";
  24. String musicXSD = "http://www.ociweb.com/xml/cd.xsd";
  25. String dateURI = "http://www.ociweb.com/date";
  26. String dateXSD = "http://www.ociweb.com/xml/date.xsd";
  27.  
  28. startNamespace("", musicURI);
  29. startNamespace("date", dateURI);
  30. startNamespace("xsi", "http://www.w3.org/1999/XMLSchema-instance");
  31. attr("xsi", "schemaLocation",
  32. musicURI + ' ' + musicXSD + ' ' + dateURI + ' ' + dateXSD);
  33. attr("year", 2008);
  34. start("cd");
  35.  
  36. attr("name", "Gardot, Melody");
  37. start("artist");
  38.  
  39. start("title");
  40. characters("Worrisome Heart");
  41. end("title");
  42.  
  43. start("date", "purchaseDate");
  44. characters("4/3/2008");
  45. end("date", "purchaseDate");
  46.  
  47. end("artist");
  48. end("cd");
  49.  
  50. endNamespace("date");
  51. contentHandler.endDocument();
  52. }
  53.  
  54. private void attr(String name, Object value) {
  55. attr("", name, value);
  56. }
  57.  
  58. private void attr(String prefix, String localName, Object value) {
  59. String uri = prefixToURIMap.get(prefix);
  60. String qName =
  61. prefix.length() == 0 ? localName : prefix + ':' + localName;
  62. attrs.addAttribute(uri, localName, qName, "CDATA", value.toString());
  63. }
  64.  
  65. private void characters(String text) throws SAXException {
  66. char[] chars = text.toCharArray();
  67. contentHandler.characters(chars, 0, chars.length);
  68. }
  69.  
  70. private void end(String name) throws SAXException {
  71. end("", name);
  72. }
  73.  
  74. private void end(String prefix, String localName)
  75. throws SAXException {
  76. String uri = prefixToURIMap.get(prefix);
  77. String qName =
  78. prefix.length() == 0 ? localName : prefix + ':' + localName;
  79. contentHandler.endElement(uri, localName, qName);
  80. }
  81.  
  82. private void endNamespace(String prefix) throws SAXException {
  83. prefixToURIMap.remove(prefix);
  84. contentHandler.endPrefixMapping(prefix);
  85. }
  86.  
  87. private void start(String name) throws SAXException {
  88. start("", name);
  89. }
  90.  
  91. private void start(String prefix, String localName) throws SAXException {
  92. String uri = prefixToURIMap.get(prefix);
  93. String qName =
  94. prefix.length() == 0 ? localName : prefix + ':' + localName;
  95. contentHandler.startElement(uri, localName, qName, attrs);
  96. attrs.clear();
  97. }
  98.  
  99. private void startNamespace(String prefix, String uri) throws SAXException {
  100. prefixToURIMap.put(prefix, uri);
  101. contentHandler.startPrefixMapping(prefix, uri);
  102. }
  103. }

Second, you need to write a class that uses JAXP to "transform" SAX events from the custom XMLReader into XML output. Here's an example.

  1. import java.io.OutputStreamWriter;
  2. import javax.xml.transform.*;
  3. import javax.xml.transform.sax.SAXSource;
  4. import javax.xml.transform.stream.StreamResult;
  5. import org.xml.sax.SAXException;
  6.  
  7. public class SAXWriter {
  8.  
  9. public static void main(String[] args) throws SAXException,
  10. TransformerConfigurationException, TransformerException {
  11.  
  12. SAXSource source = new SAXSource();
  13. // Note use of the custom XMLReader here.
  14. source.setXMLReader(new CustomXMLReader());
  15.  
  16. TransformerFactory tf = TransformerFactory.newInstance();
  17. tf.setAttribute("indent-number", 2);
  18. Transformer transformer = tf.newTransformer();
  19. transformer.setOutputProperty(OutputKeys.INDENT, "yes");
  20. // Using a Writer is key to getting indentation to work!
  21. Result result =
  22. new StreamResult(new OutputStreamWriter(System.out));
  23. transformer.transform(source, result);
  24. }
  25. }
  26.  
  27. <?xml version="1.0" encoding="UTF-8"?>
  28. <car year="2008">
  29. <model>Prius</model>
  30. </car>
  31.  
  32.  
  33. <?xml version="1.0" encoding="UTF-8"?>
  34. <car year="2008">
  35. <model>Prius</model>
  36. </car>

As complicated as the above code is, it would be even more complicated to get it to output the DOCTYPE and comment that are in our target XML, so I skipped those. This code produces the output below. The start tag, including all the namespace declarations and the schemaLocation attribute is written on one long line. I've split it up using a backslash ("\") to indicate line continuation in order to make it easier to read. That's not in the real XML.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="cd.xslt"?>
 
<cd xmlns="http://www.ociweb.com/music" \
xsi:schemaLocation="http://www.ociweb.com/music http://www.ociweb.com/xml/cd.xsd \
http://www.ociweb.com/date http://www.ociweb.com/xml/date.xsd" \
year="2008" xmlns:date="http://www.ociweb.com/date" \
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance">
  <artist name="Gardot, Melody">
    <title>Worrisome Heart</title>
    <date:purchaseDate>4/3/2008</date:purchaseDate>
  </artist>
</cd>

Clearly using WAX to write XML is much easier than using SAX and TrAX.

There's another issue with this approach. It doesn't stream the output! Instead the Transformer uses SAX events from the SAXSource to build an in-memory DOM tree. I had hoped to avoid that by using a SAXSource. No such luck!

Document Object Model (DOM)

DOM is a programming language neutral API for reading and writing XML. It is defined by a W3C recommendation. DOM doesn't stream XML when writing it. Instead, it builds an in-memory data structure that is later written as XML to a destination. This makes it unsuitable for writing large XML documents.

The next library discussed, JDOM, is an attempt to create a similar library that is specific to Java and is much easier to use. Rather than show example DOM code, I'll show example JDOM code in the next section. It is much shorter and simpler than the equivalent DOM code would be.

For more information on DOM, visit http://www.w3.org/DOM/.

JDOM

JDOM is a free, open source, Java library for reading and writing XML. It can be obtained from http://www.jdom.org/.

Like DOM, JDOM doesn't stream XML when writing it. Instead, it builds an in-memory data structure that is later written as XML to a destination. This makes it unsuitable for writing large XML documents. Here's an example that outputs the target XML.

  1. import java.io.IOException;
  2. import org.jdom.*;
  3. import org.jdom.output.*;
  4.  
  5. public class CDDemo {
  6.  
  7. public static void main(String[] args) throws IOException {
  8. // Create namespaces to be used.
  9. String url = "http://www.ociweb.com/";
  10. Namespace dateNS = Namespace.getNamespace("date", url + "date");
  11. Namespace musicNS = Namespace.getNamespace(url + "music");
  12. Namespace xsiNamespace = Namespace.getNamespace(
  13. "xsi", "http://www.w3.org/1999/XMLSchema-instance");
  14.  
  15. String rootName = "cd";
  16.  
  17. // Create the Document.
  18. Document doc = new Document();
  19. doc.addContent(new Comment(" This is one of my favorite CDs! "));
  20. doc.addContent(new ProcessingInstruction(
  21. "xml-stylesheet", "type=\"text/xsl\" href=\"cd.xslt\""));
  22. doc.setDocType(new DocType(rootName, url + "xml/cd.dtd"));
  23.  
  24. // Create the root element and define namespaces.
  25. Element root = new Element(rootName);
  26. doc.setRootElement(root);
  27. root.setNamespace(musicNS); // sets default namespace
  28. root.addNamespaceDeclaration(dateNS);
  29. root.addNamespaceDeclaration(xsiNamespace);
  30.  
  31. // Associate XML Schemas with this XML.
  32. String schemaLocation =
  33. musicNS.getURI() + url + "xml/cd.xsd " +
  34. dateNS.getURI() + url + "xml/date.xsd";
  35. root.setAttribute("schemaLocation", schemaLocation, xsiNamespace);
  36.  
  37. // Create other elements and attributes.
  38. root.setAttribute("year", "2008");
  39. Element artist = new Element("artist", musicNS);
  40. artist.setAttribute("name", "Gardot, Melody");
  41. root.addContent(artist);
  42. artist.addContent(
  43. new Element("title", musicNS).setText("Worrisome Heart"));
  44. Element purchaseDate =
  45. new Element("purchaseDate", dateNS).setText("4/3/2008");
  46. artist.addContent(purchaseDate);
  47.  
  48. // Output the XML.
  49. XMLOutputter xo = new XMLOutputter(Format.getPrettyFormat());
  50. xo.output(doc, System.out);
  51. }
  52. }

While this code is much shorter and is easier to understand than the equivalent DOM code would be, it pales in comparison to the earlier WAX code.

Groovy

Groovy "builder" classes can write XML. Two to consider are MarkupBuilder andStreamingMarkupBuilder. While both can output XML, they have limitations.

MarkupBuilder can't do the following:

Here's an example of using MarkupBuilder to output the target XML.

  1. import groovy.xml.MarkupBuilder
  2.  
  3. // Pass an IndentPrinter to the MarkupBuilder constructor
  4. // in order to output indented XML.
  5. // Without this all the XML will be on a single line.
  6. // Even with this, long sequences of attributes are not indented.
  7. // A PrintWriter can be passed to the IndentPrinter constructor.
  8. // Without that it writes to standard output.
  9. def builder = new MarkupBuilder(new IndentPrinter())
  10.  
  11. def url = 'http://www.urlweb.com'
  12.  
  13. builder.cd(
  14. xmlns : "${url}/music",
  15. 'xmlns:date' : "${url}/music/date",
  16. 'xmlns:xsi' : 'http://www.w3.org/1999/XMLSchema-instance',
  17. 'xsi:schemaLocation' :
  18. "${url}/music ${url}/xml/cd.xsd ${url}/date ${url}/xml/date.xsd",
  19. year : '2008') {
  20. artist(name : 'Gardot, Melody') {
  21. title('Worrisome Heart')
  22. 'date:purchaseDate'('4/3/2008')
  23. }
  24. }

This code produces the output below. Again, I've split it up using a backslash ("\") to indicate line continuation in order to make it easier to read. That's not in the real XML.

  1. <cd xmlns='http://www.urlweb.com/music' \
  2. xmlns:date='http://www.urlweb.com/music/date' \
  3. xmlns:xsi='http://www.w3.org/1999/XMLSchema-instance' \
  4. xsi:schemaLocation='http://www.urlweb.com/music http://www.urlweb.com/xml/cd.xsd \
  5. http://www.urlweb.com/date http://www.urlweb.com/xml/date.xsd' year='2008'>
  6. <artist name='Gardot, Melody'>
  7. <title>Worrisome Heart</title>
  8. <date:purchaseDate>4/3/2008</date:purchaseDate>
  9. </artist>
  10. </cd>

StreamingMarkupBuilder can't do the following:

Here's an example of using StreamingMarkupBuilder to output the target XML. A big thank you goes out to Mike Easter, of Code To Joy fame, for writing this!

  1. import groovy.xml.StreamingMarkupBuilder
  2.  
  3. def builder = new StreamingMarkupBuilder()
  4. builder.encoding = 'UTF-8'
  5.  
  6. def url = 'http://www.urlweb.com'
  7.  
  8. def cd = {
  9. cd('xsi:schemaLocation' :
  10. "${url}/music ${url}/xml/cd.xsd ${url}/date ${url}/xml/date.xsd",
  11. year : '2008') {
  12. artist( name : 'Gardot, Melody' ) {
  13. title('Worrisome Heart')
  14. date.purchaseDate('4/3/2008')
  15. }
  16. }
  17. }
  18.  
  19. def xmlDoc = {
  20. mkp.xmlDeclaration()
  21. mkp.comment(' This is one of my favorite CDs! ')
  22. unescaped << '\n'
  23. unescaped << '<!DOCTYPE cd SYSTEM "${oci}/xml/cd.dtd">\n'
  24. mkp.pi("xml-stylesheet" : 'type="text/xsl" href="cd.xslt"')
  25. unescaped << '\n'
  26.  
  27. mkp.declareNamespace('' : "${url}/music")
  28. mkp.declareNamespace('date' : "${url}/date")
  29. mkp.declareNamespace('xsi' : "http://www.w3.org/1999/XMLSchema-instance")
  30.  
  31. out << cd
  32. }
  33.  
  34. println builder.bind(xmlDoc)

This code produces the output below. Again, I've split it up using a backslash ("\") to indicate line continuation in order to make it easier to read. That's not in the real XML.

<?xml version="1.0" encoding="UTF-8"?>
<!-- This is one of my favorite CDs! -->
<!DOCTYPE cd SYSTEM "${url}/xml/cd.dtd">
<?xml-stylesheet type="text/xsl" href="cd.xslt"?>
<cd xsi:schemaLocation='http://www.urlweb.com/music http://www.urlweb.com/xml/cd.xsd \
http://www.urlweb.com/date http://www.urlweb.com/xml/date.xsd' year='2008' \
xmlns='http://www.urlweb.com/music' xmlns:date='http://www.urlweb.com/date' \
xmlns:xsi='http://www.w3.org/1999/XMLSchema-instance'> \
<artist name='Gardot, Melody'><title>Worrisome Heart</title> \
<date:purchaseDate>4/3/2008</date:purchaseDate></artist></cd>

The Groovy code is certainly more compact and easier to understand than the SAX code. However, it isn't as clear as the WAX code and suffers from several limitations.

XMLStreamWriter

XMLStreamWriter is a class in the javax.xml.stream package that is included in Java 6. To use it with Java 5, download Woodstox from http://woodstox.codehaus.org/.

The following code uses XMLStreamWriter to produce the target XML.

  1. import javax.xml.stream.XMLOutputFactory;
  2. import javax.xml.stream.XMLStreamException;
  3. import javax.xml.stream.XMLStreamWriter;
  4.  
  5. public class CDDemo {
  6.  
  7. public static void main(String[] args) throws XMLStreamException {
  8. XMLOutputFactory factory = XMLOutputFactory.newInstance();
  9.  
  10. // Output destination can be specified with an OutputStream or Writer.
  11. XMLStreamWriter xsm = factory.createXMLStreamWriter(System.out);
  12.  
  13. String url = "http://www.ociweb.com/";
  14. xsm.setPrefix("date", url + "date");
  15.  
  16. xsm.writeStartDocument(); // writes XML declaration
  17. xsm.writeComment(" This is one of my favorite CDs! ");
  18. String root = "cd";
  19. String doctype =
  20. "<!DOCTYPE " + root + " SYSTEM \"" + url + "xml/cd.dtd\">";
  21. xsm.writeDTD(doctype);
  22. xsm.writeProcessingInstruction(
  23. "xml-stylesheet", "type=\"text/xsl\" href=\"cd.xslt\"");
  24. xsm.writeStartElement(root);
  25. xsm.writeDefaultNamespace(url + "music");
  26. xsm.writeNamespace("date", url + "xml/date.xsd");
  27. xsm.writeNamespace("xsi", "http://www.w3.org/1999/XMLSchema-instance");
  28. xsm.writeAttribute("xsi:schemaLocation",
  29. url + "music " + url + "xml/cd.xsd " +
  30. url + "date " + url + "xml/date.xsd");
  31. xsm.writeAttribute("year", "2008");
  32.  
  33. xsm.writeStartElement("artist");
  34. xsm.writeAttribute("name", "Gardot, Melody");
  35. xsm.writeStartElement("title");
  36. xsm.writeCharacters("Worrisome Heart");
  37. xsm.writeEndElement();
  38. xsm.writeStartElement(url + "date", "purchaseDate");
  39. xsm.writeCharacters("4/3/2008");
  40.  
  41. xsm.close(); // terminates unterminated elements just like WAX
  42. }
  43. }

This approach is much closer to the WAX approach than the others examined here. However, there are several issues with XMLStreamWriter.

Conclusion

Being the author of WAX, perhaps I'm a bit biased. I think it's clear that WAX is easier to use than the other approches examined here. Another important characteristic of WAX is that it uses very little memory compared to other approaches. I'd love to hear about other approaches I should have considered. Feel free to send me email at mark@ociweb.com.