JAXB: The integration of Java and XML Schema

JAXB: The integration of Java and XML Schema

By Paul Jensen, OCI Principal Software Engineer

April 2003


Introduction

The combination of XML and Java has been advertised as the ideal fusion of technologies - Java providing a portable executable (bytecode) format and XML providing a portable data format. Since the introduction of JDK 1.4, Java has natively supported low-level XML APIs such as DOM and SAX. JDK 1.4 also introduced serialization and deserialization of JavaBeans via the XMLEncoder and XMLDecoder classes. However, these approaches to integration have some significant limitations. Working with DOM and SAX is not type safe and is generally awkward, leading to code that is difficult to maintain, especially as XML document formats change. Alternatives such as DOM4J and JDOM offer simpler APIs, but have the same maintenance problems. The JavaBean XMLEncoder technology utilizes the properties of the JavaBean to produce XML representations of "scripts" necessary to later reproduce the JavaBean. The resultant output is not generally useful in other contexts.

The JAXB (Java Architecture for XML Binding) specification addresses these issues. JAXB provides a standardized higher-level API which combines the type safety and intuitive use of JavaBean-like objects with the ability to manipulate a wide variety of XML documents. JAXB provides for the creation of Java interfaces and classes which correspond to element and data types defined via XML Schema. These generated classes provide the ability to marshall Java instances of these classes to XML and unmarshall instances from XML. The JAXB-based Java objects may be created and manipulated in fairly standard Java fashion and validated according to the XML Schema definitions upon which they are based.

JAXB XML-Java Binding

The majority of the JAXB 1.0 specification details the mapping (binding) of XML Schema constructs to the Java language, consisting of a default bindings and user-configurable binding customizations. In general, XML Schema types and elements are mapped to Java interfaces and each vendor provides proprietary implementations of these interfaces.

This article will not attempt to fully explore these mappings, instead presenting an example and exploring some of the more common mappings and customizations. The below example will model some simple data related to a Patient object, presumably a part of a health care system (although this example does not offer a realistic representation in such a system). While persistent representation of a Patient would certainly exist in the form of Data Access Objects (JDOs, EJBs, etc.) in such a system, separate representations are typically required, such as for Value Objects. The example defines very simple patient objects and walks through the basics of JAXB usage.

Defining the Schema

JAXB requires a W3C XML Schema as a starting point. Earlier versions supported DTDs, but this support has been eliminated in favor of more powerful XML Schemas. The following schema serves as the starting point for our example:

Example1.xsd

  1. <xs:schema
  2. targetNamespace="http://www.ociweb.com/jnb/april2003"
  3. xmlns="http://www.ociweb.com/jnb/april2003"
  4. xmlns:xs="http://www.w3.org/2001/XMLSchema"
  5. elementFormDefault="qualified">
  6. <xs:complexType name="PatientType">
  7. <xs:sequence>
  8. <xs:element name="billingNumber" type="xs:nonNegativeInteger"></xs:element>
  9. <xs:element name="Name" type="xs:string" ></xs:element>
  10. <xs:element name="Address" type="xs:string"></xs:element>
  11. <xs:element name="Physician" type="xs:string"></xs:element>
  12. </xs:sequence>
  13. <xs:attribute name="id" use="required">
  14. <xs:simpleType>
  15. <xs:restriction base="xs:ID">
  16. <xs:pattern value="Patient-[\S][\S]*"></xs:pattern>
  17. </xs:restriction>
  18. </xs:simpleType>
  19. </xs:attribute>
  20. </xs:complexType>
  21. <xs:element name="Patients">
  22. <xs:complexType>
  23. <xs:sequence>
  24. <xs:element name="Patient" type="PatientType" maxOccurs="unbounded"></xs:element>
  25. </xs:sequence>
  26. </xs:complexType>
  27. </xs:element>
  28. </xs:schema>

Those unfamiliar with XML Schema may refer to the references at the end of this article for additional information. However, the above schema is fairly easy to understand. It declares an element type named PatientType which requires four sub-elements. Name, Address, and Physician may contain a string as content while visitNumber requires a non-negative integer. The element may contain an unbounded number of elements. Patient elements include a required attribute which must match a pattern and, as an ID type, must be unique in the document. The following XML document is a valid instance of this schema:

Example1.xml

  1. <Patients
  2. xmlns="http://www.ociweb.com/jnb/april2003"
  3. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  4. xsi:schemaLocation="http://www.ociweb.com/jnb/april2003 Example1.xsd">
  5. <Patient id="Patient-1">
  6. <billingNumber>123</billingNumber>
  7. <Name>Smith, Fred</Name>
  8. <Address>123 Mockingbird Ln</Address>
  9. <Physician>Dr. Livingston</Physician>
  10. </Patient>
  11. <Patient id="Patient-2">
  12. <billingNumber>456</billingNumber>
  13. <Name>Jones, Tom</Name>
  14. <Address>1020 W. Addison</Address>
  15. <Physician>Dr. Buehler</Physician>
  16. </Patient>
  17. </Patients>

Binding the Schema

The next step after establishing the XML Schema is to run the binding compiler to create Java classes corresponding to the schema. Each vendor supporting JAXB will provide their own compiler and procedure . The examples here utilize the Sun JAXB Reference Implementation (RI) which is available for download as part of the Java Web Services Developer Pack (WSDP). After installing the WSDP, executing the binding compiler is simple matter. It may be executed from the command line using xjc.bat or xjc.sh or using the javac -jar syntax with the jaxb-xjc.jar. The latter seems more straightforward.

The binding compiler requires the schema file, a destination directory for the generated java files, and a package for the generated files. For the above schema (Example1.xsd), the following Ant task is executed.

  1. <java jar="${jaxb.lib.dir}/jaxb-xjc.jar" fork="true" dir="./xml" >
  2. <classpath refid="jaxb.class.path"></classpath>
  3. <arg line="-d ${gen-src.dir}"></arg>
  4. <arg line="-p com.ociweb.jnb.april2003.example1"></arg>
  5. <arg line="Example1.xsd"></arg>
  6. </java>

The output of the binding compiler is the following files (under ${gen-src.dir}):

/com
    /ociweb
        /jnb
            /april2003
                /example1
                    |   bgm.ser
                    |   jaxb.properties
                    |   ObjectFactory.java
                    |   Patients.java
                    |   PatientsType.java
                    |   PatientType.java
                    |
                    /impl
                            PatientsImpl.java
                            PatientsTypeImpl.java
                            PatientTypeImpl.java

Of these files, only the .java files in the example1 package are referenced by user code. The classes in the impl package are vendor-specific implementations and the bgm.ser and jaxb.properties files are also specific to the RI. Note that all of these vendor files will be required when deploying an application.

The Java files in the package passed as the -p argument to xjc (com.ociweb.jnb.april2003.example1) are defined by the normative XML-Java binding specification for JAXB. Regardless of XML Schema contents, an ObjectFactory class is created by the binding compiler. The ObjectFactory creates instances of XML Schema-defined elements and types, allowing creation of entirely new content trees. Usage of the ObjectFactory will be shown in a later section of this article.

The remainder of the Java files define interfaces. In JAXB terminology, interface Patients defines an Element Interface while interfaces PatientsType and PatientType define Content interfaces, mapping, respectively, to schema element declarations and content models. The PatientType reveals the intuitive binding of simple schema types to Java:

  1. public interface PatientType {
  2. java.math.BigInteger getBillingNumber();
  3. void setBillingNumber(java.math.BigInteger value);
  4. java.lang.String getAddress();
  5. void setAddress(java.lang.String value);
  6. java.lang.String getName();
  7. void setName(java.lang.String value);
  8. java.lang.String getPhysician();
  9. void setPhysician(java.lang.String value);
  10. java.lang.String getId();
  11. void setId(java.lang.String value);
  12. }

The JAXB specification defines a wide variety of binding customizations. For example, method names and property types can be changed in the generated PatientType class to map billingNumber to an int and change its name to chargeNumber. Customizations may be defined directly within the source XMLSchema document or in a separate XML document passed to the binding compiler.

The JAXB Runtime Framework

In addition to facilities and conventions for mapping XML Schema types to Java, JAXB provides general runtime APIs. This runtime framework utilizes generated bindings to allow:

The JAXBContext class provides factory methods for objects providing the above functionality. It is constructed via one of two static newInstance() methods, one taking a String contextPath as its only argument and the other the contextPath and a ClassLoader. The contextPath consists of a colon-delimited list of all package names for which marshalling, unmarshalling, and/or validation will be performed under the given JAXBContext. The newInstance method searches this path (potentially using the provided ClassLoader) for a jaxb.properties file containing a value for the javax.xml.bind.context.factory property and returns an instance of the defined concrete implementation class. (Recall that the jaxb.properties file was created earlier by the binding compiler.)

Code fragments in the following sections demonstrate the unmarshalling, modification, validation, and marshalling of the sample XML document above.

Unmarshalling

  1. // Framework Initialization
  2. // 1
  3. JAXBContext jaxbContext =
  4. JAXBContext.newInstance("com.ociweb.jnb.april2003.example1");
  5.  
  6. // Unmarshalling
  7. // 2
  8. Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
  9. // 3
  10. Patients patients = (Patients)unmarshaller.unmarshal(new File("xml/Example1.xml"));

Line 1 obtains the JAXBContext implementation for the given package (the RI in this case). Line 2 obtains an unmarshaller. Line 3 unmarshalls the XML document given a File reference. The unmarshall() method is heavily overloaded allowing the use of File, InputStream, URL, InputSource, DOM Node, or SAX Source as the content source.

Reading and Altering Content

Given the root node of the XML document (here), child nodes and content may be easily traversed using standard Java syntax in a type-safe manner. The following code retrieves the list of patients, prints out some information on each, and then adds a patient to the content tree. Note that reading and manipulating XML documents via JAXB is significantly easier than the DOM and SAX alternatives. Also note the use of the ObjectFactory class to create a new instance of an XML content type (in this case a PatientType). Adding this new instance to the returned (mutable) list results in three elements now being present in the content tree. The JAXB decision to bind the list in this way (rather than to include add and remove methods on the PatientsType itself) is rather disconcerting in that it violates standard principles of data hiding and encapsulation.

  1. // Read Content
  2. java.util.List patientList = patients.getPatient();
  3. for (Iterator itr = patientList.iterator(); itr.hasNext();) {
  4. PatientType patientType = (PatientType) itr.next();
  5. System.out.println("patientType.getId() = " + patientType.getId());
  6. System.out.println("patientType.getName() = " + patientType.getName());
  7. }
  8.  
  9. // Alter Content
  10. PatientType newPatient = new ObjectFactory().createPatientType();
  11. newPatient.setId("Patient-3");
  12. newPatient.setName("Jensen, Paul");
  13. newPatient.setAddress("601 Long Beach");
  14. newPatient.setBillingNumber(new BigInteger("105"));
  15. newPatient.setPhysician("Dr. Owen");
  16. patientList.add(newPatient);

Validation

Validation of a content tree may be achieved in several ways in the JAXB framework. By default, unmarshalling does not perform validation. Validation may be enabled by calling setValidating(true) on the Unmarshaller instance. With regard to marshalling, the specification does not require that the content tree be valid in order for marshalling to succeed. It does mandate that all implementations successfully marshall valid content.

An object implementing the ValidationEventHandler interface may be registered with the Marshaller and/or Unmarshaller to specialize error handling. Otherwise, if validation is performed, all processing is halted upon encountering the first error. Validation may also be executed on-demand using a Validator object. Some JAXB implementations may choose to support fail-fast validation, which provides immediate feedback upon an invalid change to the content.

The following code uses on-demand validation to ensure the addition of a new patient was successful. The ValidationEventCollector (from package javax.xml.bind.util) will collect all validation errors and can be subsequently queried to retrieve them. It is an implementation of the ValidationEventHandler interface.

  1. // Validation
  2. Validator validator = jaxbContext.createValidator();
  3. validator.setEventHandler(new ValidationEventCollector());
  4. validator.validate(patients);

Marshalling

The final task of this code example is to marshall the altered document. Similar to unmarshalling, the marshall() method supports a variety of destinations for marshalled content - Writer, OutputStream, DOM Node, SAX ContentHandler, or a javax.xml.transform.Result. Several properties may be set on the Marshaller to affect its output. These include character encoding, inclusion of top-level schemaLocation attributes, and pretty-printing (shown below).

The final code example demonstrates pretty-printing of the edited document to the console.

  1. // Marshalling
  2. Marshaller marshaller = jaxbContext.createMarshaller();
  3. marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
  4. marshaller.marshal(patients, System.out);

The output of the marshaller verifies the correct addition of our new element:

  1. <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  2. <ns1:Patients xmlns:ns1="http://www.ociweb.com/jnb/april2003">
  3. <ns1:Patient id="Patient-1">
  4. <ns1:billingNumber>123</ns1:billingNumber>
  5. <ns1:Name>Smith, Fred</ns1:Name>
  6. <ns1:Address>123 Mockingbird Ln</ns1:Address>
  7. <ns1:Physician>Dr. Livingston</ns1:Physician>
  8. </ns1:Patient>
  9. <ns1:Patient id="Patient-2">
  10. <ns1:billingNumber>123</ns1:billingNumber>
  11. <ns1:Name>Jones, Tom</ns1:Name>
  12. <ns1:Address>1020 W. Addison</ns1:Address>
  13. <ns1:Physician>Dr. Buehler</ns1:Physician>
  14. </ns1:Patient>
  15. <ns1:Patient id="Patient-3">
  16. <ns1:billingNumber>105</ns1:billingNumber>
  17. <ns1:Name>Jensen, Paul</ns1:Name>
  18. <ns1:Address>601 Long Beach</ns1:Address>
  19. <ns1:Physician>Dr. Owen</ns1:Physician>
  20. </ns1:Patient>
  21. </ns1:Patients>

Summary

This article has covered the basics of JAXB, providing a general understanding of the binding mechanism and a working knowledge of the runtime APIs. Much of the complexity of JAXB lies in the binding definitions and customizations (comprising the majority of the 200 page specification). As it is based on XML Schema, a complicated technology in itself, this is not surprising. (What is surprising is the unfortunate lack of support for XML Schema keys and keyrefs. IDs and IDREFs are supported.)

JAXB fills a gap in a total Java/XML integration solution, providing simpler APIs than the standard alternatives. The introduction of JAXB should server to further promote the use of XML in Java applications by simplifying access to and manipulation of XML documents.

References



Software Engineering Tech Trends (SETT) is a regular publication featuring emerging trends in software engineering.