One of my favorite features of Scala is the amazingly rich standard collections library. Normally, languages try not to make any assumptions about your preferences of the algorithms used to implement certain operations on data structures, so they don’t provide implementations for these operations at all. Consequently, you either implement your own structures/operations and endure the debt of maintaining them, or you use an external library that implements its own data structures. But what’s the point of having a standard library of collections if the mismatch between everyone’s most basic data representations is inevitable?
Luckily, the Scala collections library contains a plethora of data structures and generic operations that are hard to find in the standard libraries of other languages. Combined with the expressive nature and type safety of Scala, manipulating complex data is a blast.
This article assumes the reader is using Scala 2.13.0
Overview of scala.collection
The scala.collection
package contains definitions of many abstract structures, such as Seq
(sequence) and Map
, and Iterable
, among some generic operations. It also contains mutable
and immutable
sub-packages which contain concrete implementations of these abstract data structures. For example, immutable.List
is an extension of Seq
, and immutable.HashMap
is an extension of Map
. We’ll discuss these types of collections later on.
Iterable
Iterable
is the base of all collections, which has one abstract method: iterator
, and one type argument: A
, the type of elements in the collection. It defines many useful operations such as map
, filter
, and reduce,
so you can expect these operations to be present for all collections. This makes it really easy to define functions of Iterable
s which can operate on many types of collections. For example:
The type argument
A
ofIterable
is covariant, meaning thatIterable[Int]
is a subtype ofIterable[Number]
for example.
Values of type Iterable
can be created using Iterable.apply
:
Notice how the actual value of iterable
has the type List[Int]
. This is the default type that is given to values constructed using Iterable.apply
(since Iterable
is abstract, and List
is a concrete implementation of it.) We’ll take a look at List
s shortly.
We can iterate over any Iterable
s using for
:
You can also use [for](https://docs.scala-lang.org/tour/for-comprehensions.html)
comprehensions to transform and filter the values in the collection and return the resulting collection:
for
comprehensions are actually syntactic sugar for chains of map
, filter
, and foreach
calls. So the code above is equivalent to:
Check out function shorthand syntax if
_ % 2 == 0
looks alien to you.
Any type that implements foreach
can be iterated over using for
. This gives you the ability to iterate over and transform your own types using very clean syntax. You can also enable the use of yield
and if
guards if you implement map
and filter
methods for the type.
You can concatenate Iterable
s using the ++
operator:
Iterable
s also have head
, tail
and isEmpty
methods that make recursion pretty convenient:
Iterable
s actually have asum
method that sums any collection of numbers. How convenient is that?
When using head
and tail
(and some other methods,) you should keep in mind that the collection might be empty. Calling head
or tail
on an empty collection results in a NoSuchElementException
or an UnsupportedOperationException
respectively. A type-safe way to get around this problem is to use headOption
and match its result to know whether to call tail
or not. Calling tail
on a collection with one element will result with an empty collection:
The
[_Option_](https://www.scala-lang.org/api/2.13.0/scala/Option.html)
type encodes a value that might not exist, such as the head of an empty collection in this example.
This pattern of fooOption
appears in many places in the collections API. For example, the reduce
method assumes the first element of the collection to be the initial value for the accumulator, which is returned at the end:
The definition of reduce
is similar to:
Which doesn’t take into account that the collection can be empty. The reduceOption
method returns None
if the collection is empty. The fold
method is similar to reduce
, but it takes the initial value of the accumulator as an argument:
Iterable
s also support drop
, take
, dropWhile
, and takeWhile
:
Iterable
s have many convenience methods, such as groupBy
, which groups elements of the collection based on some attribute (user age in this example):
We can’t cover all the methods of Iterable
(or any subtype of it) in this article, but I encourage you to use the API documentation for reference.
Seq
The Seq
type represents an order-preserving sequence of elements. Any subtype of Seq
has an apply
method that takes an integer index (starting at zero) and returns the element at that location:
All Iterable
s have a size
method, which returns the number of elements in the Iterable
. Seq
s have another method: length
, which does the same thing. Seq
s define the size
method as an alias to the length
method, so they’re exactly the same thing:
You can update the element at a particular index:
Seq
s can be sorted if there’s an implementation of [Ordering](http://scala-lang.org/api/2.13.0/scala/math/Ordering.html)
for its elements in scope:
You can append and prepend elements to a Seq
:
The performance of these operations depends on the concrete implementation of Seq
(the actual value.)List
s are implemented as linked lists in Scala, so they support constant-time prepending of single elements. On the other hand, they take O(n1) time to prepend a list of n1 elements to another list. Accessing elements by index in a List
of n elements also takes O(n) time. It’s useful to know this to be able to determine the best data structure to use.
Vector
s are Seq
s that support constant-time random access/updates of elements, so they’re a good choice if you do a lot more than just iterating over the sequence and prepending elements to the collection.
List
Lists
are implemented as cons lists, which can be defined as:
sealed trait ConsList[+A]
case class Cons[A](head: A, tail: ConsList[A]) extends ConsList[A]
case object Empty extends ConsList[Nothing]
And values of ConsList
can be constructed like so:
But what’s cool is that Scala support syntax that allows a value constructor like Cons
to be a symbol like ::
, so List
s can be constructed in this fashion:
Where Nil
represents the empty list. This is cool because it allows us to match patterns with lists, for example:
Of course, you can always just use List.apply
to construct and match values:
LazyList
A LazyList
is a List
that lazily evaluates its elements. Which means that elements of a LazyList
would never be evaluated unless they are used:
Set
A Set
is a collection of unique elements that can be constructed using Set.apply:
Set
s have an apply
method that takes an element and returns whether the element is a member of the set or not:
Performing any of the Iterable
operations will preserve the uniqueness of the elements in the set. Additionally, Set
s support +
and —
methods that perform element insertion and extraction respectively:
They also support union
and intersect
methods:
Set
s don not preserve the order in which elements were inserted. But you can use ListSet
if order matters to you.
Map
A Map
is a set of key-value pairs that can be created using Map.apply
:
->
is actually just a method that returns a tuple of the two operands, so'a' -> 1
returns('a', 1)
.
You can use the apply
method to get the value of a key:
Similarly to Seq
s, you can use lift
to avoid NoSuchElementException
s:
Map
s also support +
and —
operators:
Adding a new key-value pair will overwrite any existing pairs with the same key:
You can iterate over the key-value pairs of a map using for
(or any method on Iterable
):
So far, we’ve looked at List
, LazyList
, Map
and Set
, which are all immutable collections. The other side of the collections library: scala.collection.mutable
, contains data structures similar to these(plus some new ones, like Stack
,) but ones you can mutate in place (without creating new collections.) We also haven’t looked at Queue
s, which are also in collection.immutable
. It would be difficult to cover all of this in one article (or one book,) but again, I urge you to refer to the API documentation for a complete view of the collections library.
I hope you found this introduction to be helpful. My aim was to help anyone new to Scala get started using some of the most useful functionality of the collections library (and also for me to gush over some of its really cool features.)