Google Collections

Google Collections

By Dan Lewis, OCI Senior Software Engineer

April 2008


Introduction

Google Collections is a free, open source library of Java classes that extend and enhance Java's version of the canonical data structures, collectively called the Java Collections Framework. The goal of the project is to extend the Java Collections Framework and make working with it more convenient.

The most significant extensions of the Java Collections Framework are the new collections interfaces and their supporting implementations: BiMap, MultiMap, and MultiSet. They reside within the com.google.common.collect package.

Google Collections also provides various general-purpose utility classes used by the collections package. They reside within the com.google.common.base package.

Please note that there is a follow-up article now available: Google Collections, Part 2.

Scenarios

The following scenarios demonstrate selected Google Collections features. The full source code is available.

  1. Create Generic Collections Conveniently
  2. Do Inverse Lookups with BiMap
  3. Check the Frequency with MultiMap
  4. Bag Groceries with MultiSet
  5. Transform Lists Lazily

Scenario #1: Create Generic Collections Conveniently

Google Collections makes creating parameterized collections of Lists and Sets convenient.

Before:

  1. List<String> l1 = Arrays.asList("Monday", "Tuesday", "Wednesday");
  2. List<Double> l2 = Arrays.asList(1.0, 2.0, 3.0);
  3. Set<String> s1 = new HashSet<String>();
  4. Collections.addAll(s1, "Keys", "Badge", "Wallet");
  5. Map<String,String> m1 = new HashMap<String, String>();

After:

  1. List<String> l1 = Lists.newArrayList("Monday", "Tuesday", "Wednesday");
  2. List<Double> l2 = Lists.newLinkedList(1.0, 2.0, 3.0);
  3. Set<String> s1 = Sets.newHashSet("Keys", "Badge", "Wallet");
  4. Map<String,String> m1 = Maps.newHashMap();

Note above that Arrays.asList is part of the standard JDK. However, it always returns an ArrayList, while Lists contains several options to choose from.

Observe below how Immutable maps of up to five elements can be constructed and initialized on one line.

  1. enum Party { Elephant, Donkey }
  2. ...
  3. Map<Party,String> partyNames = Maps.immutableMap(Donkey, "Democrat", Elephant, "Republican");
  4. Map<String,Party> affiliations = Maps.immutableMap("Barack", Donkey, "Hillary", Donkey, "John", Elephant);
  5. Set<String> candidates = Sets.immutableSortedSet("John", "Barack", "Hillary");
  6. for (String candidate : candidates) {
  7. System.out.printf("%s is a %s\n", candidate, partyNames.get(affiliations.get(candidate)));
  8. }

The output is:

Barack is a Democrat
Hillary is a Democrat
John is a Republican

Note that the output is sorted, thanks to the use of Sets.immutableSortedSet().

Scenario #2: Do Inverse Lookups with BiMap

From the BiMap javadocs:

A bimap (or "bidirectional map") is a map that preserves the uniqueness of its values as well as that of its keys. This constraint enables bimaps to support an "inverse view", which is another bimap containing the same entries as this bimap but with reversed keys and values.

In this example, we'll create a two-way mapping between U.S. paper currency note denominations and the portraits each features.

  1. BiMap<String,String> denomToPortrait = Maps.newHashBiMap();
  2. denomToPortrait.put("One", "Washington");
  3. denomToPortrait.put("Five", "Lincoln");
  4. denomToPortrait.put("Ten", "Hamilton");
  5. denomToPortrait.put("Twenty", "Jackson");
  6. denomToPortrait.put("Fifty", "Grant");
  7. denomToPortrait.put("Hundred", "Franklin");

As in a standard map we can look up the portrait via the denomination:

  1. System.out.printf("%s is on the %s\n", denomToPortrait.get("Ten"), "Ten");
  2. System.out.printf("%s is on the %s\n", denomToPortrait.get("Fifty"), "Fifty");

The output is:

Hamilton is on the Ten
Grant is on the Fifty

Now we can do an inverse lookup of the denomination via the portrait:

  1. BiMap<String, String> portraitToDenom = denomToPortrait.inverse();
  2. System.out.printf("%s is on the %s\n", "Jackson", portraitToDenom.get("Jackson"));
  3. System.out.printf("%s is on the %s\n", "Lincoln", portraitToDenom.get("Lincoln"));

The output is:

Jackson is on the Twenty
Lincoln is on the Five

Scenario #3: Check the Frequency with MultiMap

A multimap is a map that may contain multiple values for the same key. In the following examples this behavior is used to find the frequency of a given number of siblings. In other words, assume Jerry and Jay are the only people in the sample that have 3 siblings. The frequency of 3 siblings would thus be 2.

  1. Multimap<Integer, String> siblings = Multimaps.newHashMultimap();
  2. siblings.put(0, "Kenneth");
  3. siblings.put(1, "Joe");
  4. siblings.put(2, "John");
  5. siblings.put(3, "Jerry");
  6. siblings.put(3, "Jay");
  7. siblings.put(5, "Janet");
  8.  
  9. for (int i = 0; i < 6; i++) {
  10. int freq = siblings.get(i).size();
  11. System.out.printf("%d siblings frequency %d\n", i, freq);
  12. }

The output is:

0 siblings frequency 1
1 siblings frequency 1
2 siblings frequency 1
3 siblings frequency 2
4 siblings frequency 0
5 siblings frequency 1

There are three subinterfaces of Multimap: SetMultimap, ListMultimap, and SortedSetMultimap. SetMultimap ensures unique key/value pairs, while ListMultimap does not. ListMultimap, however, maintains the addition order of multiple values, and allows duplicates. SortedSetMultimap is like SetMultimap, but the multiple values for a given key are sorted by natural ordering. The example above utilizes a particular implementation of SetMultimap called HashMultimap.

Scenario #4: Bag Groceries with MultiSet

A Multiset, also referred to as a bag in the javadocs, preserves equality even when order is different like a set. Unlike a set, however, a bag may contain duplicates. Google Collections' Multiset provides a count method for counting duplicate members.

  1. HashMultiset<String> mon = Multisets.newHashMultiset();
  2. mon.add("milk");
  3. mon.add("bread");
  4.  
  5. HashMultiset<String> thurs = Multisets.newHashMultiset();
  6. thurs.add("bread");
  7. thurs.add("milk");
  8.  
  9. HashMultiset<String> sat = Multisets.newHashMultiset();
  10. sat.add("milk");
  11. sat.add("bread");
  12. sat.add("bread");
  13. sat.add("bread");
  14. sat.add("bread");
  15.  
  16. System.out.printf("mon = thurs : %s\n",mon.equals(thurs));
  17. System.out.printf("mon = sat : %s\n",mon.equals(sat));
  18. System.out.printf("mon we bought %d of %s\n", mon.count("milk"), "milk");
  19. System.out.printf("sat we bought %d of %s\n", sat.count("bread"), "bread");

The output is:

mon = thurs : true
mon = sat : false
mon we bought 1 of milk
sat we bought 4 of bread

Scenario #5: Transform Lists Lazily

The Lists.transform applies a function to every item in a given list. The function is not called until an element is accessed. This is important because changes in the given list after the transform will be applied.

The following example creates a transformed list that contains the string lengths of the corresponding elements in the original list:

  1. Function<String, Integer> strlen = new Function<String, Integer>() {
  2. @Override
  3. public Integer apply(String from) {
  4. Preconditions.checkNotNull(from);
  5. return from.length();
  6. }
  7. };
  8. List<String> from = Lists.newArrayList("abc", "defg", "hijkl");
  9. List<Integer> to = Lists.transform(from, strlen);
  10. for (int i = 0; i < from.size(); i++) {
  11. System.out.printf("%s has length %d\n", from.get(i), to.get(i));
  12. }

The output is:

abc has length 3
defg has length 4
hijkl has length 5

Now consider this example that detects palindromes:

  1. Function<String, Boolean> isPalindrome = new Function<String, Boolean>() {
  2. @Override
  3. public Boolean apply(String from) {
  4. Preconditions.checkNotNull(from);
  5. return new StringBuilder(from).reverse().toString().equals(from);
  6. }
  7. };
  8. List<String> from = Lists.newArrayList("rotor", "radar", "hannah", "level", "botox");
  9. List<Boolean> to = Lists.transform(from, isPalindrome);
  10. for (int i = 0; i < from.size(); i++) {
  11. System.out.printf("%s is%sa palindrome\n", from.get(i), to.get(i) ? " " : " NOT ");
  12. }
  13. // changes in the "from" list are reflected in the "to" list
  14. System.out.printf("\nnow replace hannah with megan...\n\n");
  15. from.set(2, "megan");
  16. for (int i = 0; i < from.size(); i++) {
  17. System.out.printf("%s is%sa palindrome\n", from.get(i), to.get(i) ? " " : " NOT ");
  18. }
  19.  

The output is:

rotor is a palindrome
radar is a palindrome
hannah is a palindrome
level is a palindrome
botox is NOT a palindrome
 
now replace hannah with megan...
 
rotor is a palindrome
radar is a palindrome
megan is NOT a palindrome
level is a palindrome
botox is NOT a palindrome

Note that "hannah" is replaced with "megan" in the "from" list. We can see that our Palindrome function was applied again because this time the third element is no longer a palindrome. This is what is meant by the term "lazy". The transformation doesn't happen immediately when transform is called, but is applied each time an element is accessed.

Preconditions, used in both transform examples above, is part of the com.google.common.base package. It has static methods useful for validating arguments.

Considerations

Please consider the items listed below before adopting Google Collections. This will help you to avoid bugs and the need for refactoring later.

Tests

There are no published tests for the library. An outstanding issue (#54) on this subject explains that the tests are present and being extricated from other non-open source dependencies at Google.

Java Versions Supported

Google Collections target Java 1.5. (In fact, the IntelliJ IDEA Project files expect to find JDK 1.5.0_06). kevinb9n (Kevin Bourrillion): "We will upgrade to requiring Java 6, but will create a Java 5-compatible branch and will include both forms in our release."

Version

The stated version of Google Collections is 0.5 alpha. The imminent branch for Java 6 and the lack of published tests explain the "alpha" designation.

License

Apache License 2.0: http://www.apache.org/licenses/LICENSE-2.0 The Apache License 2.0 is approved by the OSI via the License Review Process. http://www.opensource.org/licenses/alphabetical

Alternatives

Apache Commons Collections - seeks to build upon the JDK classes by providing new interfaces, implementations and utilities.

Commons-Collections with Generics - A Java 5 generics-enabled version of the popular Jakarta Commons-Collections project.

Releases

There have been two "snapshots" made available on the website:

Naming Convention for Utility Classes

Look for classes with names ending in 's'. These classes are usually good places to start because they contain factory initialization methods. See Lists, Maps, and Multimaps.

Conclusion

Google Collections extends the Java Collections Framework with useful new interfaces and makes working with generics-enabled collections convenient.

Test drive this important and useful library now, but watch for publication of the promised tests and more information about the Java 5 / Java 6 branch before adopting. Also, be sure to evaluate both the Apache Commons Collections and Commons Collections with Generics libraries.

References