Saturday, March 22, 2008

Functional Collection Patterns in Ruby

This is a modified-rewrite of my Functional Collection Patterns in Java article, using Ruby.

While many software developers are familiar with the Iterator pattern from the classic ``Design Patterns'' book, the Iterator pattern just scratches the surface of the abstractions available on collections. Classic languages such as Scheme, Common Lisp, and Smalltalk-80 provide much more powerful abstractions over collections. These abstractions are available in Ruby as well.

The patterns listed in this post are all dependent on the availability of high-order functions. A high-order function is a function that takes in a function as an argument, or returns a function. By the end of this article I hope to show that the use of high-order functions is extremely useful for separating the collection's concerns from the user of the collection.

There are three main categories of operations on collections:
  1. Map
  2. Filter
  3. Reduce

Map

The Map pattern evaluates a high-order function on all elements of the collection. It returns a new collection with the results of each function application.




Filter

The Filter pattern evaluates a predicate (a function which returns a Boolean) on each of the elements, returning a new collection which is subset of the original collection.




Reduce

The Reduce pattern evaluates a function on all elements of the collection, returning a scalar value. For instance, in the following diagram, Reduce is applied to sum the area of each shape.





Ruby Examples

This section will show how to use Ruby's blocks as high-order functions to take advantage of the Map, Filter, and Reduce patterns.


High-order functions

High-order functions are realized in Ruby via block closures.


irb(main):001:0> class Foo
irb(main):002:1> def initialize
irb(main):003:2> @bar = 1
irb(main):004:2> end
irb(main):005:1> def modifyBar
irb(main):006:2> @bar = yield(@bar)
irb(main):007:2> end
irb(main):008:1> end
=> nil
irb(main):009:0> baz = Foo.new
=> #<foo:0xb7c2dde8 bar="1">
irb(main):010:0> baz.modifyBar {|bar| bar+1}
=> 2
irb(main):011:0> baz
=> #<foo:0xb7c2dde8 bar="2">


Map

Suppose I have a collection of lowercase letters that I want to convert to uppercase letters. Ruby implements the map pattern with the "collect" method.


irb(main):001:0> foo = ["a", "e", "b", "s", "d"]
=> ["a", "e", "b", "s", "d"]
irb(main):002:0> foo.collect {|c| c.upcase}
=> ["A", "E", "B", "S", "D"]


I prefer the high-order function approach (as opposed to manually iterating through the collection ) because I don't have to set any state, reducing the possibility of iteration-related bugs. The act of iteration over the collection and the creation of the new collection is the responsibility of the collection. My responsibility as a user of the collection is to say how to transform each element of the old collection into the element of the new collection.

Filter

There are a few different ways that you might want to filter a collection. You may want to only return elements where the predicate is true. This is done using the select method. The detect method is a special case of the select method where you're only interested in the first match. Lastly, the reject method only returns elements when the predicate is false.


irb(main):001:0> foo = ["a", "e", "b", "s", "d", "b"]
=> ["a", "e", "b", "s", "d", "b"]
irb(main):002:0> foo.select { |c| c == "b"}
=> ["b", "b"]
irb(main):003:0> foo.detect { |c| c == "b"}
=> "b"
irb(main):004:0> foo.reject { |c| c == "b"}
=> ["a", "e", "s", "d"]



Reduce

The Reduce pattern is useful when you want to return a scalar value from a collection. The high order function passed to the "Reduce" implementation requires an extra parameter, the accumulator. If this doesn't make sense, think about the example where you need to sum the area of various shapes. You need to have some value to accumulate the results into. And just like recursion requires a base case, Reduce needs to have a base value for the accumulator. This is the first argument to "inject"


irb(main):001:0> foo = ["a", "e", "b", "s", "d", "b"]
=> ["a", "e", "b", "s", "d", "b"]
irb(main):002:0> foo.inject("") {|acc, c| acc + c }
=> "aebsdb"



Notice that even though I have an accumulator variable, I'm not actually setting the accumulator to the new value, I'm just returning it. That's because my responsibility is only to provide the new value of the accumulator, the collection takes care of setting state.


Combining expressions

These abstractions are powerful on their own, but even more so when combined. Let's say you want to concatenate all of the lowercase letters of a collection together. First, you use the Filter pattern to remove all uppercase letters. After, you apply the Reduce pattern to aggregate the elements into one string.


irb(main):001:0> foo = ["a", "e", "B", "s", "D"]
=> ["a", "e", "B", "s", "D"]
irb(main):002:0> foo.select {|c| c == c.downcase}.inject("") {|acc,c| acc + c}
=> "aes"


Conclusion

High-order functions can be used to help separate orthogonal concerns. Map, Filter, and Reduce are responsible for iterating over the collection and collecting the results of the high-order function. These patterns are a great example of separating the abstract from the specific.

In my humble opinion, it's important to keep the high-order function free of side effects whenever possible. This removes the possibility of iteration-related bugs, which can be difficult to debug.

4 comments:

J.Wo said...

I loved this post -- it's a concise explanation of advanced collection techniques that I've been missing! Excellent post

Bill Six said...

Good to hear you enjoyed it, I enjoyed writing it. Thanks for the feedback!

Tiago Albineli Motta said...

this post helped me a lot, because i'm just learning ruby now.

strongcoffee said...

Thanks, really nice summary.