Google Collections, Part 2

Google Collections, Part 2

By Dan Lewis, OCI Senior Software Engineer

February 2009


Introduction

Data Structures are fundamental to computer software. They allow the efficient storage, manipulation, and retrieval of data. Java's implementation of the canonical Data Structures is called Java Collections FrameworkGoogle Collections, an open source contribution from Google extends the Java Collections Framework. Google Collections is not a replacement for the Java Collections Framework, but rather supplements and extends the Java Collections Framework to provide more power and flexibility to the programmer.

This article is preceded by the April 2008 edition of the Google Collections SETT article. This article may be read independently of the previous article, but newcomers to Google Collections are encouraged to read the previous article.

Disclaimer: Google Collections library is currently under development and continues to evolve as of the time of this writing, and some APIs may be subject to change.

Topics

The following topics demonstrate selected Google Collections features and review the Java Collections Framework upon which it is based. The full source code of the examples is available.

  1. Use the Right Collection
  2. Take the Inverse View
  3. Check Types at Runtime
  4. Enforce Immutability
  5. Filter Out Unwanted Objects
  6. Collect User-Defined Types
  7. Case Study: Find the Right Blood Donor

1. Use the Right Collection

Think outside the List. Java Collections Framework and Google Collections provide many useful data structures beyond the List. Sometimes, a lot of code can be saved by using the correct collection.

Set

Sets do not allow repeated elements and do not preserve order. This example uses the newHashSet(E... elements) static method of the Sets class that accepts a varargs array of elements. Note that you don't have to specify the types on the right hand side of the assignment. Arrays or Lists that have uniqueness checks around every add method are good candidates to be refactored to be Sets.

Set<String> dontForget = Sets.newHashSet("Keys", "Badge", "Wallet");
dontForget.add("Keys"); // ignored, set will still only contain one instance of "Keys"

Multiset

Multisets (also known as bags) do allow repetition and do not preserve order. Subclass TreeMultiset keeps elements sorted according to their "natural" sort order (see http://java.sun.com/docs/books/tutorial/collections/interfaces/order.html) or can accept a Comparator. All Multisets know how many occurrences of each object are contained within and provide a count method. See the "Bag Groceries with Multiset" example in the previous Google Collections SETT article.

Multiset<String> groceries = Multisets.newHashMultiset("Apple", "Orange", "Orange", "Banana", "Pear");
System.out.printf("groceries contains %d oranges\n", groceries.count("Orange"));

The output is:

groceries contains 2 oranges

List

Lists allow repetition and preserve order. The following list contains the combination to a combination lock. The newLinkedList(E...elems) method was deprecated between versions 0.6 and 0.8 of Google Collections because of a conflict with another method of the same name that accepts an Iterator. The newArrayList(E...elems) method may well be deprecated by 1.0 for the same reason. If this occurs, Collections.addAll() provides an alternative easy way to create populated lists.

List<Integer> combination = Lists.newArrayList(36, 22, 36);

Map

Maps don't allow duplicate keys, and each key maps to only one value. Multiple keys may refer to the same value. The newHashMap(K, V)newHashMap(K, V, K, V), and newHashMap(K, V, K, V, K, V) methods were deprecated between 0.6 and 0.8 and will likely not return.

  1. Map<String, Integer> numLegs = Maps.newHashMap();
  2. numLegs.put("Horse", 4);
  3. numLegs.put("Zebra", 4);
  4. numLegs.put("Spider", 8);
  5. numLegs.put("Monkey", 2);

BiMap

BiMaps require unique keys and values but provide an inverse view. Review the example below, and also see the "Do Inverse Lookups with BiMap" example in the previous Google Collections SETT article.

  1. BiMap<String, String> usCoinsFrontBack1998 = Maps.newHashBiMap();
  2. usCoinsFrontBack1998.put("Abraham Lincoln", "Lincoln Memorial");
  3. usCoinsFrontBack1998.put("Thomas Jefferson", "Monticello");
  4. usCoinsFrontBack1998.put("Franklin D. Roosevelt", "Olive branch, Torch, Oak Branch");
  5. usCoinsFrontBack1998.put("George Washington", "Eagle");
  6. //usCoinsFrontBack1998.put("Not George Washington", "Eagle"); // java.lang.IllegalArgumentException
  7.  
  8. String query1 = "Franklin D. Roosevelt";
  9. System.out.printf("front = %s, back = %s\n", query1, usCoinsFrontBack1998.get(query1));
  10.  
  11. String query2 = "Monticello";
  12. System.out.printf("back = %s, front = %s\n", query2,
  13. usCoinsFrontBack1998.inverse().get(query2)); // now use the inverse lookup capability

The output is:

front = Franklin D. Roosevelt, back = Olive branch, Torch, Oak Branch
back = Monticello, front = Thomas Jefferson

Multimap

Multimap allows keys to map to multiple values. Pass a key into get() on a Multimap and it will return a collection of all of the values associated with that key. a collection. Google provides and implements three subinterfaces: ListMultimap, SetMultimap, and SortedSetMultimap. ListMultimap allows duplicate key-value pairs and preserves insertion order of values. SetMultimap disallows multiple key-value pairs. SortedSetMultimap is a SetMultimap that sorts the values for each key. The following example uses HashMultimap, an implementation of SetMultimap. One use of Multimap is to check the frequency of a particular key. See the "Check the Frequency with Multimap" example in the previous Google Collections SETT article.

  1. Multimap<String, String> pets = Multimaps.newHashMultimap();
  2. pets.put("John", "Cat");
  3. pets.put("John", "Dog");
  4. pets.put("Anne", "Fish");
  5. pets.put("Anne", "Bird");
  6. pets.put("Anne", "Bunny");
  7. pets.put("Anne", "Cat");
  8. pets.put("Dan", "Cat");
  9.  
  10. final Collection<String> annePets = pets.get("Anne");
  11. System.out.printf("Anne's pets: %s\n", Join.join(", ", annePets));

The output is:

Anne's pets: Bunny, Bird, Cat, Fish

2. Take the Inverse View

BiMap supports an inverse view, but Google Collections makes it easy to invert any Map or Multimap. Multimaps.forMap() converts any Map to a Multimap. Once you have a Multimap, call Multimaps.inverseHashMultimap(), Multimaps.inverseArrayListMultimap() or Multimaps.inverseTreeMultimap() for an inverted view.

  1. SetMultimap<String, Integer> numLegsMulti = Multimaps.forMap(numLegs);
  2. HashMultimap<Integer, String> invNumLegsMulti = Multimaps.inverseHashMultimap(numLegsMulti);
  3. int numLegsQuery = 4;
  4. System.out.printf("animals with %d legs = %s\n", numLegsQuery,
  5. Join.join(", ", invNumLegsMulti.get(numLegsQuery)));

The output is:

animals with 4 legs = Horse, Zebra

3. Check Types at Runtime

One criticism of Java generics is that type information is "erased" at runtime. Java Collections Framework as of version 1.5 added the ability to create a "Checked" view of any Collection, List, Map, Set, SortedMap, or SortedSet. Note that the first example provides a compile warning but allows the put() at runtime.

  1. Map<String, Integer> map = Maps.newHashMap();
  2. Map mapRaw = map;
  3. mapRaw.put("abc", "xyz"); // no RuntimeException

However, by calling Collections.checkedMap() we add runtime checking of the types. Note the runtime exception thrown on the put:

  1. Map<String, Integer> map = Maps.newHashMap();
  2. Map<String, Integer> mapChecked = Collections.checkedMap(map, String.class,
  3. Integer.class);
  4. Map mapCheckedRaw = mapChecked;
  5. mapCheckedRaw.put("abc", "xyz"); // java.lang.ClassCastException

As of 0.8 Google Collections doesn't provide this capability for Multimap. It would be an easy exercise to add this capability using ForwardedMultimap. BiMap and Multiset are accepted by Collections.checkedMap() and Collections.checkedCollection(), respectively.

4. Enforce Immutability

Immutability is the inability to change. The first snippet shows a typical, mutable map:

  1. Map<String, Integer> map = Maps.newHashMap();
  2. map.put("abc", 123); // no RuntimeException

Now, we create a map using the ImmutableMap factory method, and notice that we get a runtime exception if we attempt to add to it.

  1. Map<String, Integer> map = ImmutableMap.of("abc", 123);
  2. map.put("def", 456); // java.lang.UnsupportedOperationException

The of() factory methods allow a max of three elements. Use the provided Builder to add more elements:

  1. Map<String, Integer> map = new ImmutableMap.Builder<String, Integer>().put("abc", 123).build();
  2. map.put("def", 456); // java.lang.UnsupportedOperationException

Sometimes we want to take an existing map and make it immutable:

  1. Map<String, Integer> map = Maps.newHashMap();
  2. map.put("abc", 123); // no RuntimeException
  3.  
  4. Map<String, Integer> immutableMap = ImmutableMap.copyOf(map);
  5. immutableMap.put("def", 456); // java.lang.UnsupportedOperationException

Java Collections Framework provides this capability on its own without Google Collections:

  1. Map<String, Integer> map = Maps.newHashMap();
  2. map.put("abc", 123); // no RuntimeException
  3.  
  4. Map<String, Integer> immutableMap = Collections.unmodifiableMap(map);
  5. immutableMap.put("def", 456); // java.lang.UnsupportedOperationException

5. Filter Out Unwanted Objects

Google Collections provides a Constraint interface and some factory methods to apply constraints to existing collections. For the first example, we take a HashSet and prove that adding null to it is allowed:

  1. Set<String> set = Sets.newHashSet();
  2. set.add("A");
  3. set.add(null);

In this example, we constrain a set to not allow null additions using the built-in notNull() implementation of Constraint:

  1. Set<String> set = Sets.newHashSet();
  2. Set<String> constrainedSet = Constraints.constrainedSet(set, Constraints.notNull());
  3. constrainedSet.add("A");
  4. constrainedSet.add(null); // NullPointerException here

The Constraints class provides methods constrainedCollection(), constrainedList(),constrainedMultiset()constrainedSortedSet() in addition to constrainedSet().

6. Collect User-Defined Types

The examples in this article up to this point collect the built-in Java types, e.g. String, Integer, etc. Java Collections Framework and Google Collections demonstrate increased value when applied to more sophisticated user-defined types, i.e. types the programmer defines to represent abstractions in his/her problem domain.

The prerequisite to collecting a user-defined type is defining high quality equals() and hashCode() methods. See items #8 and #9 in Effective Java by Josh Bloch. The following class has high quality equals() and hashCode() methods.

  1. public class Combination {
  2. private int a;
  3. private int b;
  4. private int c;
  5.  
  6. public Combination(int a, int b, int c) {
  7. this.a = a;
  8. this.b = b;
  9. this.c = c;
  10. }
  11.  
  12. @Override public boolean equals(Object o) {
  13. if (o == this) {
  14. return true;
  15. }
  16. if (!(o instanceof Combination)) {
  17. return false;
  18. }
  19. Combination rhs = (Combination) o;
  20. return rhs.a == a && rhs.b == b && rhs.c == c;
  21. }
  22.  
  23. @Override public int hashCode() {
  24. int result = 17;
  25. result = 31 * result + a;
  26. result = 31 * result + b;
  27. result = 31 * result + c;
  28. return result;
  29. }
  30. }

Now let's use Google Collections' BiMap to collect our combinations and map them to our lockers.

  1. BiMap<String, Combination> locks = Maps.newHashBiMap();
  2. locks.put("school", new Combination(36, 22, 36));
  3. locks.put("ymca", new Combination(14, 7, 28));
  4. locks.put("club", new Combination(3, 18, 6));

Weeks later a combination is found written on a slip of paper in the junk drawer. To which locker does it refer?

BiMap<Combination, String> locksInv = locks.inverse();
System.out.println(locksInv.get(new Combination(36, 22, 36)));

The output is:

school

This works because the Combination class provides an override for equals(). If it didn't, the default definition inherited from Object would be used, and the query into the locksInv Map would have returned null. The default implementation of equals() within the Object class returns true only if both objects are the same object:

  1. public boolean equals(Object obj) {
  2. return (this == obj);
  3. }

Also note that because Combination overrides equals(), it must also, according to contract, override hashCode() as well. From the Object.equals() javadocs:

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

Ordering

Sorting is another area where user-defined types can leverage JCF and Google Collections. To take advantage, implement the Comparable interface. As an example, let's consider a time slip. A time slip is a paper that a driver receives after a drag race. It shows his elapsed time for the distance he/she raced. This TimeSlip class implements the Comparable interface and overrides equals() and hashCode():

  1. public class TimeSlip implements Comparable<TimeSlip> {
  2. String driver;
  3. int elapsedTime;
  4.  
  5. public TimeSlip(String driver, int elapsedTime) {
  6. this.driver = driver;
  7. this.elapsedTime = elapsedTime;
  8. }
  9.  
  10. @Override
  11. public boolean equals(Object o) {
  12. if (o == this) {
  13. return true;
  14. }
  15. if (!(o instanceof TimeSlip)) {
  16. return false;
  17. }
  18. TimeSlip rhs = (TimeSlip) o;
  19. return rhs.driver.equals(driver) && rhs.elapsedTime == elapsedTime;
  20. }
  21.  
  22. @Override
  23. public int hashCode() {
  24. int result = 17;
  25. result = 31 * result + driver.hashCode();
  26. result = 31 * result + elapsedTime;
  27. return result;
  28.  
  29. }
  30.  
  31. public int compareTo(TimeSlip timeSlip) {
  32. if (timeSlip == this) {
  33. return 0;
  34. }
  35. if (elapsedTime < timeSlip.elapsedTime) {
  36. return -1;
  37. } else if (elapsedTime == timeSlip.elapsedTime) {
  38. return 0;
  39. } else {
  40. return +1;
  41. }
  42. }
  43.  
  44. @Override
  45. public String toString() {
  46. return String.format("%s/%d", driver, elapsedTime);
  47. }
  48. }

Now TimeSlip instances can be collected with ordering collections. Using Google Collections' TreeMultimap, lets collect last weekend's drag racing results between Dan, Carey, and Dean. Because we have implemented Comparable, TreeMultimap will do the work of automatically sorting the TimeSlips as they are added. And because it's a Multimap, TreeMultimap will also allow multiple values per key.

  1. Multimap<String, TimeSlip> weekend = Multimaps.newTreeMultimap();
  2. weekend.put("Friday", new TimeSlip("Carey", 13));
  3. weekend.put("Friday", new TimeSlip("Dan", 14));
  4. weekend.put("Sunday", new TimeSlip("Dean", 10));
  5. weekend.put("Friday", new TimeSlip("Dean", 12));
  6. weekend.put("Saturday", new TimeSlip("Dan", 13));
  7. weekend.put("Saturday", new TimeSlip("Dean", 11));
  8. weekend.put("Sunday", new TimeSlip("Dan", 12));
  9. weekend.put("Saturday", new TimeSlip("Carey", 12));
  10. weekend.put("Sunday", new TimeSlip("Carey", 11));
  11.  
  12. System.out.printf("Friday's Times: %s\n", Join.join(", ", weekend.get("Friday")));
  13. System.out.printf("Saturday's Times: %s\n", Join.join(", ", weekend.get("Saturday")));
  14. System.out.printf("Sunday's Times: %s\n", Join.join(", ", weekend.get("Sunday")));

The output is:

Friday's Times: Dean/12, Carey/13, Dan/14
Saturday's Times: Dean/11, Carey/12, Dan/13
Sunday's Times: Dean/10, Carey/11, Dan/12

Observe that the results are in order, fastest to slowest, and grouped by day.

7. Case Study: Find the Right Blood Donor

In this exercise Google Collections and Java Collections Framework are used in concert to explore the problem of matching blood donors to recipients. First, we enumerate the blood types:

  1. enum BloodType {
  2. OPOS, APOS, BPOS, ABPOS, ONEG, ANEG, BNEG, ABNEG
  3. }

Now, we choose Multimap to store the compatibility matrix of blood types. The first putAll(K key, V... values) encodes the rule that ONEG can donate to ONEG, OPOS, ANEG, APOS, BNEG, BPOS, ABNEG, and ABPOS. Let's make it immutable because this information will not change during the execution of the program and to reflect the stability of these rules:

  1. // donor -> recipient red blood cell compatibility (source = http://en.wikipedia.org/wiki/Blood_type)
  2. Multimap<BloodType, BloodType> donorToRecipients = new ImmutableMultimap.Builder<BloodType, BloodType>().
  3. putAll(ONEG, ONEG, OPOS, ANEG, APOS, BNEG, BPOS, ABNEG, ABPOS).
  4. putAll(OPOS, OPOS, APOS, BPOS, ABPOS).
  5. putAll(ANEG, ANEG, APOS, ABNEG, ABPOS).
  6. putAll(APOS, APOS, ABPOS).
  7. putAll(BNEG, BNEG, BPOS, ABNEG, ABPOS).
  8. putAll(BPOS, BPOS, ABPOS).
  9. putAll(ABNEG, ABNEG, ABPOS).
  10. putAll(ABPOS, ABPOS).
  11. build();

Later in the exercise we'll want to determine the compatible donors for a recipient. So, we invert the Multimap:

  1. // recipient -> donor red blood cell compatibility
  2. Multimap<BloodType, BloodType> recipientToDonor = Multimaps.inverseHashMultimap(donorToRecipients);

Now, let's collect some fictional people and their blood types, and create an inverse of this map as well.

  1. // cast member -> blood type N -> 1
  2. Map<String,BloodType> castBloodTypes = new ImmutableMap.Builder<String,BloodType>().
  3. put("Fred", ONEG).
  4. put("Wilma", APOS).
  5. put("Pebbles", ANEG).
  6. put("Barney", ABPOS).
  7. put("Betty", OPOS).
  8. put("Bamm-Bamm", ANEG).
  9. build();
  10.  
  11. Multimap<BloodType, String> castBloodTypesInv = Multimaps.inverseHashMultimap(Multimaps.forMap(castBloodTypes));

Barney cuts his finger and needs blood, so let's do some lookups in our maps to find compatible donors from within the cast.

  1. // Barney cuts his finger and needs blood. Who from the cast can donate?
  2. String injured = "Barney";
  3.  
  4. // create an empty set to hold the possible donors for later display
  5. Set<String> possibleDonors = new HashSet<String>();
  6.  
  7. // look up the blood type of the injured person
  8. BloodType injuredType = castBloodTypes.get(injured);
  9.  
  10. // look up the compatible donor types for the injured person's blood type
  11. Collection<BloodType> compatibleDonorTypes = recipientToDonor.get(injuredType);
  12.  
  13. // iterate over each compatible donor bood type
  14. for (BloodType compatibleDonorType : compatibleDonorTypes) {
  15.  
  16. // find cast members with the compatible type
  17. Collection<String> castMembersWithType = castBloodTypesInv.get(compatibleDonorType);
  18.  
  19. // add all cast members with the compatible type to the set of possible donors
  20. possibleDonors.addAll(castMembersWithType);
  21. }
  22.  
  23. possibleDonors.remove(injured); // the donor shouldn't try to donate to him/herself
  24. possibleDonors.remove("Pebbles"); // too young to donate
  25. possibleDonors.remove("Bamm-Bamm"); // too young to donate
  26.  
  27. System.out.printf("%s can donate to %s\n", Join.join(", ", possibleDonors), injured);
  28.  

The output is:

Betty, Fred, Wilma can donate to Barney

Going further, let's capture the distribution of blood types for a reference population in another map. Let's use this information to choose donor based on the availability of people with the same blood type.

Disclaimer: this probably isn't the best algorithm, and may not even be a good one

  1. // blood type -> percentage in USA population (source = http://en.wikipedia.org/wiki/Blood_type)
  2. Map<BloodType, Double> bloodTypeDistributionUSA = new ImmutableMap.Builder<BloodType, Double>()
  3. .put(OPOS, 37.4)
  4. .put(APOS, 35.7)
  5. .put(BPOS, 8.5)
  6. .put(ABPOS, 3.4)
  7. .put(ONEG, 6.6)
  8. .put(ANEG, 6.3)
  9. .put(BNEG, 1.5)
  10. .put(ABNEG, 0.6)
  11. .build();
  12.  
  13. // create a Function that looks up the percentage in population based on Blood Type
  14. Function<BloodType, Double> bloodTypePct = Functions.forMap(bloodTypeDistributionUSA);
  15.  
  16. // create a Function that looks up the Blood Type for a person
  17. Function<String, BloodType> personToBloodType = Functions.forMap(castBloodTypes);
  18.  
  19. // create a Function that looks up the percentage in population with the same Blood Type as a given person
  20. Function<String, Double> personToBloodTypePct = Functions.compose(bloodTypePct, personToBloodType);
  21.  
  22. // user our composed lookups to create a Comparator that sorts by availability
  23. Comparator<String> personToBloodTypePctComparator = Comparators.fromFunction(personToBloodTypePct);
  24.  
  25. // reverse the comparator to prefer blood types with higher availability (descending)
  26. Comparator<String> personToBloodTypePctComparatorDescend =
  27. Collections.reverseOrder(personToBloodTypePctComparator);
  28.  
  29. // create a Set of donors that is sorted according to blood type availability, favoring higher availability
  30. Set<String> possibleDonorsOptimized = Sets.newTreeSet(personToBloodTypePctComparatorDescend, possibleDonors);
  31.  
  32. System.out.printf("%s (sorted in order of preference) can donate to %s",
  33. Join.join(", ", possibleDonorsOptimized), injured);

Note that we use some of the built-in "Functions" that Google collections provides. Functions.forMap() creates a function that simply does a lookup in a Map. Functions.compose() composes two functions, e.g. f(g(x)). Comparators.fromFunction() creates a Comparator given a Function, and we convert the Set of possible donors to an array to facilitate sorting. Finally, we use Java Collections Framework Collections.reverseOrder() to flip the sort order.

The output is:

Betty, Wilma, Fred (sorted in order of preference) can donate to Barney

Considerations

Please consider the items listed below before adopting Google Collections. This will help you to avoid bugs and the need for refactoring later.

Tests

There are still no published tests for the library. An outstanding issue (#54) on this subject explains that the tests are present and being extricated from other non-open source dependencies at Google.

Java Versions Supported

Google Collections target Java 1.5. kevinb9n (Kevin Bourrillion): "We will upgrade to requiring Java 6, but will create a Java 5-compatible branch and will include both forms in our release." (1 Nov. 2007) however, "this will probably not happen for a long time" (6 Jan. 2009)

Version

The stated version of Google Collections is 0.8 alpha.

License

Apache License 2.0: http://www.apache.org/licenses/LICENSE-2.0 The Apache License 2.0 is approved by the OSI via the License Review Process. http://www.opensource.org/licenses/alphabetical

Alternatives

Apache Commons Collections - seeks to build upon the JDK classes by providing new interfaces, implementations and utilities.

Commons-Collections with Generics - A Java 5 generics-enabled version of the popular Jakarta Commons-Collections project.

References