Introduction to XQuery
Weiqi Gao, OCI Principal Software Engineer
JANUARY 2004
Introduction
XQuery is a strongly typed functional language for processing real or virtual XML data. Its rich data model and ergonomic expressions provide an environment where most programmers would feel at home taking apart and piecing together XML.
The XQuery specifications are prepared by the W3C XML Query Working Group, and are in the Last Call Working Draft stage. Hopefully they will achieve recommendation status in 2004.
In this article, I try to get you started exploring XQuery to see if you can take advantage of it today.
Available Implementations
The W3C XML Query Working Group page has a list of available XQuery implementations.
Saxon by Michael Kay and GNU Qexo by Per Bothner are two open source implementations that I find helpful. Saxon implements most of the mandatory features of the November 12, 2003 XQuery working draft. GNU Qexo, based on the GNU Kawa framework for programming languages for the JVM, supports compiling XQuery programs into Java classes. It also supports interactive sessions, which is great for learning.
To follow along this article, download
- Saxon 7.8: http://prdownloads.sourceforge.net/saxon/saxon7-8.zip and
- GNU Qexo 1.7.90 http://ftp.gnu.org/pub/gnu/kawa/kawa-1.7.90.jar,
unzip saxon7-8.zip, and add saxon7.jar and kawa-1.7.90.jar to the CLASSPATH.
I use these shell scripts to save some typing:
xquery:
java net.sf.saxon.Query "$@"
qexo:
java kawa.repl --xquery "$@"
[weiqi@gao] $ qexo
(: 1 :) "Hello, World!"
Hello, World!
(: 2 :) 1024, 3.1416, 2.9979e8
1024 3.1416 2.9979E8
(: 3 :) <greeting from="weiqi">Hi</greeting>
<greeting from="weiqi">Hi</greeting>
The smileys are delimiters of XQuery comments. They nest. Qexo uses them as part of the prompt. Cheers you up, doesn't it?
We first entered a string "Hello, World!", and Qexo responded by printing it out. Then we entered a sequence of three numbers: an integer, a decimal, and a double. Finally we created a piece of new XML data---an element with one attribute and some content.
Sequences
All data in XQuery are sequences. A sequence is made up of items. An item can be either an atomic value or a node. A sequence with one item is the same as that item.
The comma operator combines sequences and flattens the result. You cannot create a sequence of sequences.
(: 4 :) (1024, 3.1416, 2.9979e8), ("Hello, World!", <greeting></greeting>)
1024 3.1416 2.9979E8 Hello, World! <greeting></greeting>
Notice that parentheses are used around a sequence when only a single expression is expected as in the above case or in function calls.
XQuery also defines the union
, intersect
and except
operators for sequences of nodes. Their behavior depends on the concept of node identity, which we'll cover in a later section.
Atomic Values and Simple Types
Strings, integers, decimals, and doubles are the only data types whose value can be entered into an XQuery program as literals.
They are examples of built-in W3C XML Schema types. XQuery supports all built-in W3C XML Schema types plus a few additional types. The W3C XML Schema types have names that start with xs:
, for example, xs:string
, xs:integer
, xs:decimal
, xs:double
, xs:boolean
, xs:float
, etc. The XQuery defined types have names that start with xdt:
, for example, xdt:dayTimeDuration
, xdt:yearMonthDuration
. Note that xs
and xdt
are predeclared namespace prefixes for http://www.w3.org/2001/XMLSchema
, and http://www.w3.org/2003/11/xpath-datatypes
respectively.
The instance of
operator tests the type of a value:
(: instance-of.xq :)
3.1416 instance of xs:decimal
[weiqi@gao] $ xquery instance-of.xq
true
Constructor functions exist for all built-in types that convert strings or other values into values of its type:
(: constructor.xq :)
xs:date("2003-12-31") instance of xs:date
[weiqi@gao] $ xquery constructor.xq
true
The cast as
operator works exactly like a constructor function:
(: cast-as.xq :)
1 cast as xs:boolean,
0 cast as xs:boolean,
"true" cast as xs:boolean,
"false" cast as xs:boolean
[weiqi@gao] $ xquery cast-as.xq
true
false
true
false
Nodes and Node Types
XML data appear in XQuery programs as nodes. Nodes can be either created anew or selected from existing nodes using XPath expressions.
[weiqi@gao] $ qexo
(: 1 :) <greeting from="weiqi">Hello, World!</greeting>
<greeting from="weiqi">Hello, World!</greeting>
(: 2 :) document {
(: 3{:) element { "greeting" } {
(: 4{:) attribute { "from " } { "weiqi" },
(: 5{:) "Hello, World!"
(: 6{:) }
(: 7{:) }
<greeting from ="weiqi">Hello, World!</greeting>
Here we created an element literally, and then created an XML document using the document
, element
, and attribute
constructors. XQuery uses {}
to surround enclosed expressions. (Notice how Qexo's prompt changes to indicate the current expression nesting.)
Nodes have types. Types exist for six kinds of nodes in XML: document-node()
, element()
, attribute()
, processing-instruction()
, comment()
, text()
. The node()
type represents all kinds of nodes. Namespace nodes are handled through namespace declarations. The parentheses are part of the type name, not function calls.
(: node-types.xq :)
document { <greeting>} instance of document-node(),
element greeting { "Hello" } instance of element(),
attribute from { "weiqi" } instance of attribute()
[weiqi@gao] $ xquery node-types.xq
true
true
true
Nodes have identities. Two nodes have the same identity if and only if they are selected from the same spot in the same XML document. Newly constructed nodes always have a new identity. Identities can be tested with the is
operator:
(: 8 :) <greeting/> is <greeting/>
false
Input Functions
XQuery provides the doc()
and collection()
functions to bring external XML data into a program. The doc()
function takes a URI and returns a document node. The collection()
function takes a URI and returns in a sequence of nodes. The collection()
function interprets the URI in an implementation specific way.
We can use doc()
to input an XML document greeting.xml
that contains:
greeting from = "weiqi">Hello, World!</greeting>
[weiqi@gao] $ qexo
(: 1 :) doc("greeting.xml")
<greeting from="weiqi">Hello, World!</greeting>
Notice how the spaces surrounding the equal sign disappeared and how a character entity reference has been resolved (111 is the ASCII code for 'o'
). XQuery works on the infoset of XML documents, where insignificant white spaces, entity references, and CDATA sections have already been resolved.
XPath Expressions
XQuery includes XPath 2.0 as a sublanguage. XPath expressions produce new node sequences out of old ones.
An XPath expression consists of one or more steps separated by / or //. Each step has an axis, a test and optional predicates.
Each step works on the result of the previous steps and produces its own results for the next step. A step goes through each node in the input sequence to generate partial results, which are then put together to form the output sequence.
Let's look at a few XPath expressions as they are applied to the XML document greetings.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<greetings>
<greeting from="weiqi">Nihao!</greeting>
<greeting from="brian">Hi!</greeting>
<greeting from="luc">Bonjour!</greeting>
</greetings>
[weiqi@gao] $ qexo
(: 1 :) doc("greetings.xml")/greetings
<greetings>
<greeting from="weiqi">Nihao!</greeting>
<greeting from="brian">Hi!</greeting>
<greeting from="luc">Bonjour!</greeting>
</greetings>
(: 2 :) doc("greetings.xml")//greeting
<greeting from="weiqi">Nihao!</greeting><greeting
from="brian">Hi!</greeting><greeting from="luc">Bonjour!</greeting>
(: 3 :) doc("greetings.xml")//greeting[@from="weiqi"]
<greeting from="weiqi">Nihao!</greeting>
(: 4 :) doc("greetings.xml")//greeting/@from
from="weiqi" from="brian" from="luc"
(: 5 :) doc("greetings.xml")//greeting[1]
<greeting from="weiqi>Nihao!</greeting>
Here we selected the greetings
element from the child axis of the XML document, the greeting
elements from the descendant axis, greeting
descendants whose from
attribute has the value "weiqi"
, the from
attributes of greeting
descendants, and the first greeting
descendant.
There is a lot more to XPath expressions that we cannot cover here. For example, you can use the wild card character *
in place of element and attribute names. You can also select nodes by their types rather than names.
FLWOR Expressions
FLWOR, pronounced flower, stands for "for
, let
, where
, order by
, return
", after the five clauses of the expression. The for
and let
clauses introduce variables and bind them to values. The optional where
clause filters the variables. The optional order by
clause imposes an order on the variables. The return
clause builds the result sequence. Notice that the use of return
in XQuery is quite different from Java. It specifies the result of a sub-expression and does not imply returning from a function.
[weiqi@gao] $ qexo
(: 1 :) for $x in (1, 2, 3)
(: 2f:) return <number>{ $x }</number>
<number>1</number><number>2</number><number>3</number>
The for
clause binds the variable $x
(variable names always start with a dollar sign), to each item of (1, 2, 3)
in turn. The element constructor in the return
clause is evaluated three times. The value of the expression is a sequence of three elements. (Qexo's prompt changes to reflect the clauses we are in.)
(: 3 :) let $a := (1, 2, 3)
(: 4l:) return <numbers>{ $a }</numbers>
<numbers>1 2 3</numbers>
The let
clause binds $a to the whole sequence (1, 2, 3). The element constructor in the return
clause is evaluated only once. The content of the numbers
element is the string value of (1, 2, 3)
.
(: 5 :) for $x in (1, 2, 3)
(: 6f:) where $x >= 2
(: 7w:) return <number>{ $x }</number>
<number>2</number><number>3</number>
The effect of the where
clause is obvious here.
(: order-by.xq :)
for $x in (<greeting/>, <greeting from="weiqi"/>, <greeting from="brian"/>)
order by $x/@from ascending empty least
return $x
[weiqi@gao-2001 junk]$ xquery order-by.xq
<?xml version="1.0" encoding="UTF-8"?>
<greeting/>
<?xml version="1.0" encoding="UTF-8"?>
<greeting from="brian"/>
<?xml version="1.0" encoding="UTF-8"?>
<greeting from="weiqi"/>
Here we sorted a sequence of greeting
elements by their from
attribute in ascending order where a missing attribute is considered to be less than others. (Saxon puts an XML declaration in front of every document node or top level element in the sequence when they are printed. But Saxon's output format is highly configurable.) You can also specify descending
or empty greatest
. The default order by
direction is ascending
. The default empty item treatment is implementation-defined.
Quantifiers
Quantifier expressions test for a condition for all or some items in a sequence. The existential quantifier (some
) tests if some member satisfies the condition; the universal quantifier (every
) tests if all members satisfy the condition.
[weiqi@gao] $ qexo
(: 1 :) some $x in (1, 2, 3) satisfies $x >= 2
true
(: 2 :) every $x in (1, 2, 3) satisfies $x >= 2
false
(: 3 :) some $x in (1, 2, 3), $y in (3, 4, 5) satisfies $x = $y
true
(: 4 :) every $x in (1, 2, 3), $y in (3, 4, 5) satisfies $x = $y
false
Conditional Expressions
In XQuery's if
expression, the else
clause is mandatory. The empty sequence ()
can be used after the else
clause to return nothing.
(: 5 :) for $x in (-1.5, 0.4, 1.7)
(: 6f:) return <amount> {
(: 7{:) if ($x < 0)
(: 8i:) then
(: 9i:) concat("(", -$x, ")")
(: 10i:) else
(: 11i:) $x
(: 12{:) } </amount>
<amount>(1.5)</amount><amount>0.4</amount><amount>1.7</amount>
Functions and Variables
XQuery provides a rich set of built-in functions and operators. These include functions and operators on strings, numbers, dates, times, durations, booleans, nodes, and various other kind of data encountered in XML.
XQuery also supports user defined functions and variables:
(: fib.xq :)
declare namespace jnb = "http://ociweb.com/jnb";
declare variable $jnb:pi as xs:decimal { 3.1416 };
declare function jnb:fib($i as xs:integer) as xs:integer {
if ($i = 0 or $i = 1)
then 1
else jnb:fib($i - 1) + jnb:fib($i - 2)
};
jnb:fib(3), jnb:fib(4), jnb:fib(5), $jnb:pi
[weiqi@gao] $ xquery fib.xq
3
5
8
3.1416
Here we declared jnb
as an XML prefix with an URI of http://ociweb.com/jnb
, declared a variable named $jnb:pi
, declared a function named jnb:fib
that calculates the $i
-th Fibonacci number, evaluated the function three times, and printed the value of $jnb:pi
. We specified the type of $jnb:pi
as xs:decimal
. We specified both the parameter type and the return type of the function as xs:integer
.
You can append the familiar ?
, *
, and +
occurrence indicators to the type specifiers. Thus a parameter of type xs:integer?
accepts either an integer or the empty sequence ()
. A return type of node()*
indicates that the function returns a (possibly empty) sequence of nodes.
Multiple function parameters are separated by commas. Parameter or return type specifications can be omitted, in which case they default to item()*
, the type of any XQuery sequence.
XQuery variables are read-only as there is no way to assign new values to a variable after its declaration. However they may be shadowed temporarily by variable bindings introduced with for
or let
clauses.
Modules
You can put functions and variables declarations into library modules. A library module is a file that starts with a module namespace declaration and contains declarations of functions, variables, etc., but does not contain an expression at the end. A main module contains an expression at the end. Both library modules and main modules can import other library modules to access variables and functions declared in the imported module.
(: libfib.xq :)
module namespace jnb = "http://ociweb.com/jnb";
declare function jnb:fib($i as xs:integer) as xs:integer {
if ($i <= 1)
then 1
else jnb:fib($i - 1) + jnb:fib($i - 2)
};
(: mainfib.xq: )
import module namespace jnb = "http://ociweb.com/jnb" at "libfib.xq";
jnb:fib(6)
[weiqi@gao] $ xquery mainfib.xq # Saxon
13
Qexo supports compiled modules. A library module is compiled to a Java class whose name is derived from the module namespace URI. A main module is compiled to a Java class whose name is derived from the module file name.
[weiqi@gao] $ qexo -C libfib.xq # Compile to Java class com.ociweb.jnb
(compiling libfib.xq)
[weiqi@gao] $ qexo --main -C mainfib.xq # Compile to Java class mainfib
(compiling mainfib.xq)
[weiqi@gao] $ java mainfib
13
XQuery API for Java
An XQuery API for Java is being developed as JSR 225. Few details are available now.
For the time being implementation specific Java APIs can be used to embed XQuery into Java programs. Both Saxon and Qexo provide easy to use Java APIs to execute XQuery programs inside a Java process. They also provide ways to call Java methods from XQuery programs.
Summary
We covered the very basics of the XQuery language. There are more features to XQuery than what is presented here. We did not cover W3C XML Schema imports, user defined types from schemas, static type checking, validation and integration with SQL databases and XML databases.
As the W3C XQuery specifications progress toward recommendation status and beyond, and more Open Source and commercial products become available and more robust, XQuery will become another useful and versatile tool in the Java programmers toolbox.
References
- [1] XQuery Home Page
http://www.w3.org/XML/Query - [2] JSR 225: XQuery API for Java (XQJ)
http://www.jcp.org/en/jsr/detail?id=225 - [3] Saxon Home Page
http://saxon.sourceforge.net/ - [4] GNU Qexo Home Page
http://www.gnu.org/software/qexo/ - [5] XQuery from the Experts, an Addison-Wesley book
http://www.awprofessional.com/catalog/product.asp - [6] An XQuery Community Web Site
http://xquery.com/
Software Engineering Tech Trends (SETT) is a regular publication featuring emerging trends in software engineering.