Introduction to XQuery

Introduction to XQuery

Weiqi Gao, OCI Principal Software Engineer

JANUARY 2004


Introduction

XQuery is a strongly typed functional language for processing real or virtual XML data. Its rich data model and ergonomic expressions provide an environment where most programmers would feel at home taking apart and piecing together XML.

The XQuery specifications are prepared by the W3C XML Query Working Group, and are in the Last Call Working Draft stage. Hopefully they will achieve recommendation status in 2004.

In this article, I try to get you started exploring XQuery to see if you can take advantage of it today.

Available Implementations

The W3C XML Query Working Group page has a list of available XQuery implementations.

Saxon by Michael Kay and GNU Qexo by Per Bothner are two open source implementations that I find helpful. Saxon implements most of the mandatory features of the November 12, 2003 XQuery working draft. GNU Qexo, based on the GNU Kawa framework for programming languages for the JVM, supports compiling XQuery programs into Java classes. It also supports interactive sessions, which is great for learning.

To follow along this article, download

unzip saxon7-8.zip, and add saxon7.jar and kawa-1.7.90.jar to the CLASSPATH.

I use these shell scripts to save some typing:

xquery:
    java net.sf.saxon.Query "$@"
 
qexo:
    java kawa.repl --xquery "$@"

Hello, World!

[weiqi@gao] $ qexo
(: 1 :) "Hello, World!"
Hello, World!
(: 2 :) 1024, 3.1416, 2.9979e8
1024 3.1416 2.9979E8
(: 3 :) <greeting from="weiqi">Hi</greeting>
<greeting from="weiqi">Hi</greeting>

The smileys are delimiters of XQuery comments. They nest. Qexo uses them as part of the prompt. Cheers you up, doesn't it?

We first entered a string "Hello, World!", and Qexo responded by printing it out. Then we entered a sequence of three numbers: an integer, a decimal, and a double. Finally we created a piece of new XML data---an element with one attribute and some content.

Sequences

All data in XQuery are sequences. A sequence is made up of items. An item can be either an atomic value or a node. A sequence with one item is the same as that item.

The comma operator combines sequences and flattens the result. You cannot create a sequence of sequences.

(: 4 :) (1024, 3.1416, 2.9979e8), ("Hello, World!", <greeting></greeting>)
 1024 3.1416 2.9979E8 Hello, World! <greeting></greeting>

Notice that parentheses are used around a sequence when only a single expression is expected as in the above case or in function calls.

XQuery also defines the unionintersect and except operators for sequences of nodes. Their behavior depends on the concept of node identity, which we'll cover in a later section.

Atomic Values and Simple Types

Strings, integers, decimals, and doubles are the only data types whose value can be entered into an XQuery program as literals.

They are examples of built-in W3C XML Schema types. XQuery supports all built-in W3C XML Schema types plus a few additional types. The W3C XML Schema types have names that start with xs:, for example,  xs:stringxs:integer,  xs:decimalxs:double,  xs:booleanxs:float, etc. The XQuery defined types have names that start with xdt:, for example, xdt:dayTimeDurationxdt:yearMonthDuration. Note that xs and xdt are predeclared namespace prefixes for http://www.w3.org/2001/XMLSchema, and http://www.w3.org/2003/11/xpath-datatypes respectively.

The instance of operator tests the type of a value:

(: instance-of.xq :)
3.1416 instance of xs:decimal
 
[weiqi@gao] $ xquery instance-of.xq
true

Constructor functions exist for all built-in types that convert strings or other values into values of its type:

(: constructor.xq :)
xs:date("2003-12-31") instance of xs:date
 
[weiqi@gao] $ xquery constructor.xq
true

The cast as operator works exactly like a constructor function:

(: cast-as.xq :)
1 cast as xs:boolean,
0 cast as xs:boolean,
"true" cast as xs:boolean,
"false" cast as xs:boolean
 
[weiqi@gao] $ xquery cast-as.xq
true
false
true
false

Nodes and Node Types

XML data appear in XQuery programs as nodes. Nodes can be either created anew or selected from existing nodes using XPath expressions.

[weiqi@gao] $ qexo
(: 1 :) <greeting from="weiqi">Hello, World!</greeting>
<greeting from="weiqi">Hello, World!</greeting>
(: 2 :) document {
(: 3{:)   element { "greeting" } {
(: 4{:)     attribute { "from " } { "weiqi" },
(: 5{:)     "Hello, World!"
(: 6{:)   }
(: 7{:) }
<greeting from ="weiqi">Hello, World!</greeting>

Here we created an element literally, and then created an XML document using the documentelement, and attribute constructors. XQuery uses {} to surround enclosed expressions. (Notice how Qexo's prompt changes to indicate the current expression nesting.)

Nodes have types. Types exist for six kinds of nodes in XML: document-node()element()attribute()processing-instruction()comment()text(). The node() type represents all kinds of nodes. Namespace nodes are handled through namespace declarations. The parentheses are part of the type name, not function calls.

(: node-types.xq :)
document { <greeting>} instance of document-node(),
element greeting { "Hello" } instance of element(),
attribute from { "weiqi" } instance of attribute()
 
[weiqi@gao] $ xquery node-types.xq
true
true
true

Nodes have identities. Two nodes have the same identity if and only if they are selected from the same spot in the same XML document. Newly constructed nodes always have a new identity. Identities can be tested with the is operator:

(: 8 :) <greeting/> is <greeting/>
false

Input Functions

XQuery provides the doc() and collection() functions to bring external XML data into a program. The doc() function takes a URI and returns a document node. The collection() function takes a URI and returns in a sequence of nodes. The collection() function interprets the URI in an implementation specific way.

We can use doc() to input an XML document greeting.xml that contains:

greeting from = "weiqi">Hell&#111;, World!</greeting>
 
[weiqi@gao] $ qexo
(: 1 :) doc("greeting.xml")
<greeting from="weiqi">Hello, World!</greeting>

Notice how the spaces surrounding the equal sign disappeared and how a character entity reference has been resolved (111 is the ASCII code for 'o'). XQuery works on the infoset of XML documents, where insignificant white spaces, entity references, and CDATA sections have already been resolved.

XPath Expressions

XQuery includes XPath 2.0 as a sublanguage. XPath expressions produce new node sequences out of old ones.

An XPath expression consists of one or more steps separated by / or //. Each step has an axis, a test and optional predicates.

Each step works on the result of the previous steps and produces its own results for the next step. A step goes through each node in the input sequence to generate partial results, which are then put together to form the output sequence.

Let's look at a few XPath expressions as they are applied to the XML document greetings.xml:

<?xml version="1.0" encoding="UTF-8"?>
<greetings>
  <greeting from="weiqi">Nihao!</greeting>
  <greeting from="brian">Hi!</greeting>
  <greeting from="luc">Bonjour!</greeting>
</greetings>
 
[weiqi@gao] $ qexo
(: 1 :) doc("greetings.xml")/greetings
<greetings>
  <greeting from="weiqi">Nihao!</greeting>
  <greeting from="brian">Hi!</greeting>
  <greeting from="luc">Bonjour!</greeting>
</greetings>
(: 2 :) doc("greetings.xml")//greeting
<greeting from="weiqi">Nihao!</greeting><greeting
from="brian">Hi!</greeting><greeting from="luc">Bonjour!</greeting>
(: 3 :) doc("greetings.xml")//greeting[@from="weiqi"]
<greeting from="weiqi">Nihao!</greeting>
(: 4 :) doc("greetings.xml")//greeting/@from
from="weiqi" from="brian" from="luc"
(: 5 :) doc("greetings.xml")//greeting[1]
<greeting from="weiqi>Nihao!</greeting>

Here we selected the greetings element from the child axis of the XML document, the greeting elements from the descendant axis, greeting descendants whose from attribute has the value "weiqi", the from attributes of greeting descendants, and the first greeting descendant.

There is a lot more to XPath expressions that we cannot cover here. For example, you can use the wild card character * in place of element and attribute names. You can also select nodes by their types rather than names.

FLWOR Expressions

FLWOR, pronounced flower, stands for "for, let, where, order by, return", after the five clauses of the expression. The for and let clauses introduce variables and bind them to values. The optional where clause filters the variables. The optional order by clause imposes an order on the variables. The return clause builds the result sequence. Notice that the use of return in XQuery is quite different from Java. It specifies the result of a sub-expression and does not imply returning from a function.

[weiqi@gao] $ qexo
(: 1 :) for $x in (1, 2, 3)
(: 2f:) return <number>{ $x }</number>
<number>1</number><number>2</number><number>3</number>

The for clause binds the variable $x (variable names always start with a dollar sign), to each item of (1, 2, 3) in turn. The element constructor in the return clause is evaluated three times. The value of the expression is a sequence of three elements. (Qexo's prompt changes to reflect the clauses we are in.)

(: 3 :) let $a := (1, 2, 3)
(: 4l:) return <numbers>{ $a }</numbers>
<numbers>1 2 3</numbers>

The let clause binds $a to the whole sequence (1, 2, 3). The element constructor in the return clause is evaluated only once. The content of the numbers element is the string value of (1, 2, 3).

(: 5 :) for $x in (1, 2, 3)
(: 6f:) where $x >= 2
(: 7w:) return <number>{ $x }</number>
<number>2</number><number>3</number>

The effect of the where clause is obvious here.

(: order-by.xq :)
for $x in (<greeting/>, <greeting from="weiqi"/>, <greeting from="brian"/>)
order by $x/@from ascending empty least
return $x
 
[weiqi@gao-2001 junk]$ xquery order-by.xq
<?xml version="1.0" encoding="UTF-8"?>
<greeting/>
<?xml version="1.0" encoding="UTF-8"?>
<greeting from="brian"/>
<?xml version="1.0" encoding="UTF-8"?>
<greeting from="weiqi"/>

Here we sorted a sequence of greeting elements by their from attribute in ascending order where a missing attribute is considered to be less than others. (Saxon puts an XML declaration in front of every document node or top level element in the sequence when they are printed. But Saxon's output format is highly configurable.) You can also specify descending or empty greatest. The default order by direction is ascending. The default empty item treatment is implementation-defined.

Quantifiers

Quantifier expressions test for a condition for all or some items in a sequence. The existential quantifier (some) tests if some member satisfies the condition; the universal quantifier (every) tests if all members satisfy the condition.

[weiqi@gao] $ qexo
(: 1 :) some $x in (1, 2, 3) satisfies $x >= 2
true
(: 2 :) every $x in (1, 2, 3) satisfies $x >= 2
false
(: 3 :) some $x in (1, 2, 3), $y in (3, 4, 5) satisfies $x = $y
true
(: 4 :) every $x in (1, 2, 3), $y in (3, 4, 5) satisfies $x = $y
false

Conditional Expressions

In XQuery's if expression, the else clause is mandatory. The empty sequence () can be used after the else clause to return nothing.

(: 5 :) for $x in (-1.5, 0.4, 1.7)
(: 6f:) return <amount> {
(: 7{:)   if ($x < 0)
(: 8i:)   then
(: 9i:)     concat("(", -$x, ")")
(: 10i:)  else
(: 11i:)    $x
(: 12{:) } </amount>
<amount>(1.5)</amount><amount>0.4</amount><amount>1.7</amount>

Functions and Variables

XQuery provides a rich set of built-in functions and operators. These include functions and operators on strings, numbers, dates, times, durations, booleans, nodes, and various other kind of data encountered in XML.

XQuery also supports user defined functions and variables:

(: fib.xq :)
declare namespace jnb = "http://ociweb.com/jnb";
declare variable $jnb:pi as xs:decimal { 3.1416 };
declare function jnb:fib($i as xs:integer) as xs:integer {
  if ($i = 0 or $i = 1)
  then 1
  else jnb:fib($i - 1) + jnb:fib($i - 2)
};
jnb:fib(3), jnb:fib(4), jnb:fib(5), $jnb:pi
 
[weiqi@gao] $ xquery fib.xq
3
5
8
3.1416

Here we declared jnb as an XML prefix with an URI of http://ociweb.com/jnb, declared a variable named $jnb:pi, declared a function named jnb:fib that calculates the $i-th Fibonacci number, evaluated the function three times, and printed the value of $jnb:pi. We specified the type of $jnb:pi as xs:decimal. We specified both the parameter type and the return type of the function as xs:integer.

You can append the familiar ?*, and + occurrence indicators to the type specifiers. Thus a parameter of type xs:integer? accepts either an integer or the empty sequence (). A return type of node()* indicates that the function returns a (possibly empty) sequence of nodes.

Multiple function parameters are separated by commas. Parameter or return type specifications can be omitted, in which case they default to item()*, the type of any XQuery sequence.

XQuery variables are read-only as there is no way to assign new values to a variable after its declaration. However they may be shadowed temporarily by variable bindings introduced with for or let clauses.

Modules

You can put functions and variables declarations into library modules. A library module is a file that starts with a module namespace declaration and contains declarations of functions, variables, etc., but does not contain an expression at the end. A main module contains an expression at the end. Both library modules and main modules can import other library modules to access variables and functions declared in the imported module.

(: libfib.xq :)
module namespace jnb = "http://ociweb.com/jnb";
declare function jnb:fib($i as xs:integer) as xs:integer {
  if ($i <= 1)
  then 1
  else jnb:fib($i - 1) + jnb:fib($i - 2)
};
 
(: mainfib.xq: )
import module namespace jnb = "http://ociweb.com/jnb" at "libfib.xq";
jnb:fib(6)
 
[weiqi@gao] $ xquery mainfib.xq          # Saxon
13

Qexo supports compiled modules. A library module is compiled to a Java class whose name is derived from the module namespace URI. A main module is compiled to a Java class whose name is derived from the module file name.

[weiqi@gao] $ qexo -C libfib.xq          # Compile to Java class com.ociweb.jnb
(compiling libfib.xq)
[weiqi@gao] $ qexo --main -C mainfib.xq  # Compile to Java class mainfib
(compiling mainfib.xq)
[weiqi@gao] $ java mainfib
13

XQuery API for Java

An XQuery API for Java is being developed as JSR 225. Few details are available now.

For the time being implementation specific Java APIs can be used to embed XQuery into Java programs. Both Saxon and Qexo provide easy to use Java APIs to execute XQuery programs inside a Java process. They also provide ways to call Java methods from XQuery programs.

Summary

We covered the very basics of the XQuery language. There are more features to XQuery than what is presented here. We did not cover W3C XML Schema imports, user defined types from schemas, static type checking, validation and integration with SQL databases and XML databases.

As the W3C XQuery specifications progress toward recommendation status and beyond, and more Open Source and commercial products become available and more robust, XQuery will become another useful and versatile tool in the Java programmers toolbox.

References



 

Software Engineering Tech Trends (SETT) is a regular publication featuring emerging trends in software engineering.