The Java Collections API
By Dean Wette, OCI Principal Software Engineer
November 1999
Introduction to Collections
Virtually any non-trivial program in Java – or any other programming language for that matter – deals with groups of objects organized together within a collecting or containing object.
The most familiar container object is the indexed array, which is supported by most, if not all, modern general-purpose programming languages.
Arrays, which are typically fixed in size once created, represent a sequential arrangement of like objects where access to individual elements is via a positional index into the array. An array is typically used to represent a (possibly sorted) list of objects that might be traversed in stored order.
Another common collection is the associative array, also known as a map, which is organized non-sequentially in unknown order by key/value pairs, and does not need to be permanently fixed in size when created. The "value" is the element itself (or a reference to it) and the "key" is a unique identifier for that element. Generally the memory position of a value is determined by applying a hashing algorithm to its key.
Maps represent mappings of items with unique context, such as dictionaries or other keyed lists. Whereas indexed arrays provide element access by sequential position, maps provide element access by named association.
Each of these types of arrays has its own advantages and disadvantages depending on context, but how these containers are used is another critical concern in terms of ease of use, flexibility, and reusability. This is where collections frameworks become important.
Collection Frameworks
A collections framework represents a unified application programming interface for not only representing collections of objects, but also for manipulating them in a consistent manner at a high enough level of abstraction to encourage reusability and good object-oriented design.
Well-designed object-oriented collection frameworks provide an architecture and infrastructure with the following three characteristics:
- Interfaces. Abstract data types for representing collections of objects that provide behaviors for manipulating collections. These interfaces and their methods work independently from implementation details of the objects they collect together.
- Implementations. Concrete implementations of the interfaces that characterize common types of collections. They provide core functionality of the framework without additional development work.
- Algorithms. Methods that provide computations on collections, such as sorting, searching, reversals, and copying.
Why use a collection framework?
Developers and managers both have much to gain by promoting the use of a collections framework.
Many OO programmers understand the benefits of collections frameworks, such as Smalltalk’s collection classes and C++’s Standard Template Library (STL).
Collections frameworks reduce programming effort by providing useful data structures and algorithms commonly used in most software systems, leaving more time to focus on the unique details of a development project.
They also promote software reuse, since the interfaces and behaviors they define encourage reusability by their very nature.
Moreover, they provide interoperability and consistency between otherwise unrelated interfaces, resulting in reduced inter-object dependencies and increased flexibility, which are important attributes of good object-oriented design.
While versions of Java before the Java 2 platform (jdk1.2) provided collection implementations with Array
, Vector
, and Hashtable
, they did not provide a framework within to use these collections, making efficient and intelligent use of collections non-trivial and time-consuming.
This is now changed with the introduction of Java 2 and its extensible Collections API. In fact, the Collections API is one of the most important new enhancements to the Java platform.
For those using Java 1.1, the Collections API is also available as a separate package, albeit with some limitations.
The Java Collections API – Interfaces and Implementations
The Java Collections API incorporates all the characteristics listed above for a collections framework. Most of the interfaces and classes are found in java.util
while some of the infrastructure interfaces are in java.lang
.
The framework itself is organized hierarchically with the Collection
interface as the root.
Collection Interface
The Collection
interface represents a group of elements and provides a high level of abstraction for transferring data between unrelated interfaces. It is defined as follows:
- public interface Collection {
- // Basic Operations
- int size();
- boolean isEmpty();
- boolean contains(Object element);
- boolean add(Object element); // Optional
- boolean remove(Object element); // Optional
- Iterator iterator();
-
- // Bulk Operations
- boolean containsAll(Collection c);
- boolean addAll(Collection c); // Optional
- boolean removeAll(Collection c); // Optional
- boolean retainAll(Collection c); // Optional
- void clear(); // Optional
-
- // Array Operations
- Object[] toArray();
- Object[] toArray(Object a[]);
- }
Collection
provides all the basic operations one might expect for a collection of objects: creation, addition, deletion, and iteration, as well as query and conversion operations.
Next in the hierarchy from Collection
are the Set
and List
interfaces.
Set Interface
A Set
is a Collection
with unique elements and prevents duplication within the collection. Two concrete implementations are provided:
HashSet
– stores its elements in a hashtable and provides the best performanceTreeSet
– guarantees iteration order of its elements by storing them in a red-black tree
In addition, an abstract skeletal implementation is provided in AbstractSet
for those who want more functionality without the effort of implementing basic operations.
List Interface
A List
is a Collection
with an ordered sequence of its elements and may contain duplicate objects.
List
s can be accessed by integer index, giving the user precise control over element position.
The Vector
class is an example of a List
that has been retrofitted to implement the List
interface in Java 2.
The List
interface adds operations to Collection
for positional access, search, and range view. In addition to Vector
, two new concrete implementations are provided:
ArrayList
– stores its elements internally as an array and provides best performance under most circumstancesLinkedList
– stores its elements as a doubly-linked list
Maps
The Collections API also supports maps, but within a hierarchy distinct from Collection
.
A Map
is an object that maps keys to values, where the list of keys is itself a Collection
object. The map can contain duplicate values, but the keys in a map must be distinct.
The top of this hierarchy is the Map
interface, which is defined as follows:
- public interface Map {
- // Basic Operations
- Object put(Object key, Object value);
- Object get(Object key);
- Object remove(Object key);
- boolean containsKey(Object key);
- boolean containsValue(Object value);
- int size();
- boolean isEmpty();
-
- // Bulk Operations
- void putAll(Map t);
- void clear();
-
- // Collection Views
- public Set keySet();
- public Collection values();
- public Set entrySet();
-
- // Interface for entrySet elements
- public interface Entry {
- Object getKey();
- Object getValue();
- Object setValue(Object value);
- }
- }
The Hashtable class is an example of a Map
that has also been retrofitted to implement the Map
interface in Java 2.
Two additional concrete implementations of Map
are provided:
HashMap –
stores its elements in a hashtable and provides the best performanceTreeMap –
guarantees iteration order of its elements by storing them in a red-black tree
As with the Set
interface, an abstract skeletal implementation is provided in AbstractMap
.
SortedSet and SortedMap Interfaces
Following from Set
in the Collection
s hierarchy and from Map
in the Map
hierarchy are SortedSet and SortedMap
interfaces for maintaining collections in sorted order.
The Collections API provides support for object ordering in two ways:
- One is with the
Comparable
interface, which imposes a natural order on classes that implement it. - For classes that don't implement
Comparable
, or when one needs even more control over ordering, theComparator
interface is provided.
In Java 2, String
, all the wrapper classes (Integer
, Double
, etc.) and several other standard classes have been retrofitted to implement the Comparable
interface, but users of the Collections API with Java 1.1 can create Comparator
objects when sorted collections are needed.
Any class that implements either interface can be used in a SortedSet
or SortedMap
object. The TreeSet
and TreeMap
classes are concrete implementations of SortedSet
and SortedMap
respectively.
The Java Collections API – Algorithms
The Collections API also provides reusable functionality with a core set of algorithms that operate on List
collections. Furthermore, the polymorphic nature of these algorithms enables their operation on a variety of classes that implement a common interface.
The provided operations include:
- Sorting – reorders a
List
according to the ordering defined by theComparable
orComparator
implementation of it elements - Shuffling – does the opposite by destroying the ordering of a List
- Reverse – reverses the order of a
List
- Fill – overwrites every
List
element with a specified value - Copy – copies elements of one
List
into anotherList
- Binary search – searches for a specified
List
element using a binary search algorithm - Extreme value search – finds the minimum and maximum values in a collection
The last two algorithms operate on any Collection
objects rather than just List
s.
The Java Collections API – Other Features and Benefits
The Collections API has some other important features that add flexibility, performance, and robustness to programs that utilize it.
Set
Algebra – The core operations inCollection
s provide some powerful tools for performing set algebra, such as finding subsets, intersections, and unions between objects, includingCollection
views ofMap
s.- Performance –
Collection
s have much better performance compared to the olderVector
andHashtable
classes with the elimination of synchronization overhead (Vector
andHashtable
remain synchronized as before). - Thread Safety – When synchronization is required, wrapper implementations are provided for temporarily synchronizing existing collection objects.
- Immutability – When immutability is required (such as using a
Collection
as a bean property), wrapper implementations are provided for making a collection immutable. - Extensibility – The interfaces and abstract classes provide an excellent starting point for adding functionality and features to create specialized object collections.
There is much more to appreciate about the new Java Collections API. This article barely scratches the surface, but more information is readily available in addition to that included in the online and printed versions of The Java Tutorial Continued.
References
- [1] Campione, Mary, and the Tutorial Team. The Java Tutorial Continued. Reading, MA: Addison-Wesley. 1998. ISBN 0-201-48558-3.
The printed version of the online tutorial mentioned above. The Collections section is very well written and one of the best technology introductions of any kind. Arguably the best place to start. - [2] Chan, Patrick; Rosanna Lee and Douglas Kramer. The Java Class Libraries Second Edition, Volume 1: Supplement for the Java 2 Platform Standard Edition, v1.2. Reading, MA: Addison-Wesley. 1999. ISBN 0-201-48552-4.
Contains API documentation with excellent annotations for everything in the Collections API, and includes tips on when to use each type of collection.
Software Engineering Tech Trends (SETT) is a regular publication featuring emerging trends in software engineering.