The Java Collections API

The Java Collections API

By Dean Wette, OCI Principal Software Engineer

November 1999


Introduction to Collections

Virtually any non-trivial program in Java – or any other programming language for that matter – deals with groups of objects organized together within a collecting or containing object. 

The most familiar container object is the indexed array, which is supported by most, if not all, modern general-purpose programming languages.

Arrays, which are typically fixed in size once created, represent a sequential arrangement of like objects where access to individual elements is via a positional index into the array. An array is typically used to represent a (possibly sorted) list of objects that might be traversed in stored order.

Another common collection is the associative array, also known as a map, which is organized non-sequentially in unknown order by key/value pairs, and does not need to be permanently fixed in size when created. The "value" is the element itself (or a reference to it) and the "key" is a unique identifier for that element. Generally the memory position of a value is determined by applying a hashing algorithm to its key. 

Maps represent mappings of items with unique context, such as dictionaries or other keyed lists. Whereas indexed arrays provide element access by sequential position, maps provide element access by named association.  

Each of these types of arrays has its own advantages and disadvantages depending on context, but how these containers are used is another critical concern in terms of ease of use, flexibility, and reusability.  This is where collections frameworks become important.

Collection Frameworks

A collections framework represents a unified application programming interface for not only representing collections of objects, but also for manipulating them in a consistent manner at a high enough level of abstraction to encourage reusability and good object-oriented design. 

Well-designed object-oriented collection frameworks provide an architecture and infrastructure with the following three characteristics:

  1. Interfaces. Abstract data types for representing collections of objects that provide behaviors for manipulating collections. These interfaces and their methods work independently from implementation details of the objects they collect together.
  2. Implementations. Concrete implementations of the interfaces that characterize common types of collections. They provide core functionality of the framework without additional development work.
  3. Algorithms. Methods that provide computations on collections, such as sorting, searching, reversals, and copying.

Why use a collection framework?

Developers and managers both have much to gain by promoting the use of a collections framework.

Many OO programmers understand the benefits of collections frameworks, such as Smalltalk’s collection classes and C++’s Standard Template Library (STL).

Collections frameworks reduce programming effort by providing useful data structures and algorithms commonly used in most software systems, leaving more time to focus on the unique details of a development project. 

They also promote software reuse, since the interfaces and behaviors they define encourage reusability by their very nature.

Moreover, they provide interoperability and consistency between otherwise unrelated interfaces, resulting in reduced inter-object dependencies and increased flexibility, which are important attributes of good object-oriented design.

While versions of Java before the Java 2 platform (jdk1.2) provided collection implementations with Array, Vector, and Hashtable, they did not provide a framework within to use these collections, making efficient and intelligent use of collections non-trivial and time-consuming. 

This is now changed with the introduction of Java 2 and its extensible Collections API. In fact, the Collections API is one of the most important new enhancements to the Java platform.

For those using Java 1.1, the Collections API is also available as a separate package, albeit with some limitations.

The Java Collections API – Interfaces and Implementations

The Java Collections API incorporates all the characteristics listed above for a collections framework. Most of the interfaces and classes are found in java.util while some of the infrastructure interfaces are in java.lang.

The framework itself is organized hierarchically with the Collection interface as the root.

Collection Interface

The Collection interface represents a group of elements and provides a high level of abstraction for transferring data between unrelated interfaces. It is defined as follows:

  1. public interface Collection {
  2. // Basic Operations
  3. int size();
  4. boolean isEmpty();
  5. boolean contains(Object element);
  6. boolean add(Object element); // Optional
  7. boolean remove(Object element); // Optional
  8. Iterator iterator();
  9.  
  10. // Bulk Operations
  11. boolean containsAll(Collection c);
  12. boolean addAll(Collection c); // Optional
  13. boolean removeAll(Collection c); // Optional
  14. boolean retainAll(Collection c); // Optional
  15. void clear(); // Optional
  16.  
  17. // Array Operations
  18. Object[] toArray();
  19. Object[] toArray(Object a[]);
  20. }

Collection provides all the basic operations one might expect for a collection of objects: creation, addition, deletion, and iteration, as well as query and conversion operations.

Next in the hierarchy from Collection are the Set and List interfaces.

Set Interface

A Set is a Collection with unique elements and prevents duplication within the collection. Two concrete implementations are provided:

  1. HashSet – stores its elements in a hashtable and provides the best performance
  2. TreeSet – guarantees iteration order of its elements by storing them in a red-black tree

In addition, an abstract skeletal implementation is provided in AbstractSet for those who want more functionality without the effort of implementing basic operations.

List Interface

A List is a Collection with an ordered sequence of its elements and may contain duplicate objects. 

Lists can be accessed by integer index, giving the user precise control over element position. 

The Vector class is an example of a List that has been retrofitted to implement the List interface in Java 2. 

The List interface adds operations to Collection for positional access, search, and range view. In addition to Vector, two new concrete implementations are provided:

  1. ArrayList – stores its elements internally as an array and provides best performance under most circumstances
  2. LinkedList – stores its elements as a doubly-linked list

Maps

The Collections API also supports maps, but within a hierarchy distinct from Collection

A Map is an object that maps keys to values, where the list of keys is itself a Collection object. The map can contain duplicate values, but the keys in a map must be distinct.

The top of this hierarchy is the Map interface, which is defined as follows:

  1. public interface Map {
  2. // Basic Operations
  3. Object put(Object key, Object value);
  4. Object get(Object key);
  5. Object remove(Object key);
  6. boolean containsKey(Object key);
  7. boolean containsValue(Object value);
  8. int size();
  9. boolean isEmpty();
  10.  
  11. // Bulk Operations
  12. void putAll(Map t);
  13. void clear();
  14.  
  15. // Collection Views
  16. public Set keySet();
  17. public Collection values();
  18. public Set entrySet();
  19.  
  20. // Interface for entrySet elements
  21. public interface Entry {
  22. Object getKey();
  23. Object getValue();
  24. Object setValue(Object value);
  25. }
  26. }

The Hashtable class is an example of a Map that has also been retrofitted to implement the Map interface in Java 2. 

Two additional concrete implementations of Map are provided:

  1. HashMap – stores its elements in a hashtable and provides the best performance
  2. TreeMap – guarantees iteration order of its elements by storing them in a red-black tree

As with the Set interface, an abstract skeletal implementation is provided in AbstractMap.

SortedSet and SortedMap Interfaces

Following from Set in the Collections hierarchy and from Map in the Map hierarchy are SortedSet and SortedMap interfaces for maintaining collections in sorted order.

The Collections API provides support for object ordering in two ways:

In Java 2, String, all the wrapper classes (Integer, Double, etc.) and several other standard classes have been retrofitted to implement the Comparable interface, but users of the Collections API with Java 1.1 can create Comparator objects when sorted collections are needed.

Any class that implements either interface can be used in a SortedSet or SortedMap object. The TreeSet and TreeMap classes are concrete implementations of SortedSet and SortedMap respectively.

The Java Collections API – Algorithms

The Collections API also provides reusable functionality with a core set of algorithms that operate on List collections. Furthermore, the polymorphic nature of these algorithms enables their operation on a variety of classes that implement a common interface.

The provided operations include:

  1. Sorting – reorders a List according to the ordering defined by the Comparable or Comparator implementation of it elements
  2. Shuffling – does the opposite by destroying the ordering of a List
  3. Reverse – reverses the order of a List
  4. Fill – overwrites every List element with a specified value
  5. Copy – copies elements of one List into another List
  6. Binary search – searches for a specified List element using a binary search algorithm
  7. Extreme value search – finds the minimum and maximum values in a collection

The last two algorithms operate on any Collection objects rather than just Lists.

The Java Collections API – Other Features and Benefits

The Collections API has some other important features that add flexibility, performance, and robustness to programs that utilize it.

  1. Set Algebra – The core operations in Collections provide some powerful tools for performing set algebra, such as finding subsets, intersections, and unions between objects, including Collection views of Maps.
  2. Performance – Collections have much better performance compared to the older Vector and Hashtable classes with the elimination of synchronization overhead (Vector and Hashtable remain synchronized as before).
  3. Thread Safety – When synchronization is required, wrapper implementations are provided for temporarily synchronizing existing collection objects.
  4. Immutability – When immutability is required (such as using a Collection as a bean property), wrapper implementations are provided for making a collection immutable.
  5. Extensibility – The interfaces and abstract classes provide an excellent starting point for adding functionality and features to create specialized object collections.

There is much more to appreciate about the new Java Collections API. This article barely scratches the surface, but more information is readily available in addition to that included in the online and printed versions of The Java Tutorial Continued

References



Software Engineering Tech Trends (SETT) is a regular publication featuring emerging trends in software engineering.