Erlang: sets
Join the DZone community and get the full member experience.
Join For FreeWhile we have been instantiating and navigating lists in most of these examples, they are a data structure that provides access only with a O(N) complexity as the list is traversed; only appending and removing an element from a list can be accomplished in constant time.
Erlang provides then other collection abstractions with an internal representation consisting of trees and hash tables, classic solutions to improve the complexity class of operations such as lookup and test of existence of an element.
In this article we will explore sets, and their ordered variation.
ETS primitives
The Erlang Term Storage is an infrastructure shipped with the Erlang VM that allows the construction and easy manipulation of some data structures, one of which are sets.
Let's create a set of Movies, and put some elements in it:
sets_contain_only_a_single_element_for_each_key_test() -> Movies = ets:new(movies, [set]), ets:insert(Movies, {"Star Wars", 1978}), ets:insert(Movies, {"Star Wars", 1977}), [{_, Year}] = ets:lookup(Movies, "Star Wars"), ?assertEqual(1977, Year).
Sets have the property of containing only one element for each instance of their key, which is by default considered as the first field of the inserted tuples ("Star Wars" here). So when we insert a new tuple with the same key, we are effectively overwriting the previous one. ets:lookup/2 then returns a list consisting of only one element, while in the case of other data structures the elements may be many.
Keys aren't hardcoded to be the first field of tuples, and they can also be atoms instead of strings or other types:
sets_contain_only_a_single_element_for_each_key_test() -> Movies = ets:new(movies, [set]), ets:insert(Movies, {"Star Wars", 1978}), ets:insert(Movies, {"Star Wars", 1977}), [{_, Year}] = ets:lookup(Movies, "Star Wars"), ?assertEqual(1977, Year).
In case you want to set up some global state, you can allow referencing a table by name instead of passing around the return value of ets:new/2. Personally I prefer to explicitly pass dependencies to the code that needs them, in any programming language.
names_table_allow_later_referencing_test() -> ets:new(movies, [set, named_table]), ets:insert(movies, {"Star Wars", 1977}), [{_, Year}] = ets:lookup(movies, "Star Wars"), ?assertEqual(1977, Year).
Note that ets:* primitives do introduce some global state to the application, as named tables can be looked up by processes via hardcoded atoms (even between the execution of different automated tests). You should call ets:delete/1 to free an ETS data structure and invalidate accesses to it.
However ETS data structures do not provide a great support for concurrency: you're supposed to hide them inside their own processes. To help you with that and avoid escaping references, Erlang lets you define private storage:
private_sets_cannot_exit_the_current_process_test() -> Movies = ets:new(movies, [set, private]), process_flag(trap_exit, true), spawn_link(fun() -> fill(Movies) end), receive {'EXIT', _, {badarg, [{Module, Function, _}, _]}} -> ?assertEqual(ets, Module), ?assertEqual(insert, Function) end. fill(Set) -> ets:insert(Set, {"Star Wars", 1977}).
Here we have linked the children process to the parent in order to be able to trap its exit as a message. The message contains the description of the problem: the call to ets:insert/2 has failed with a badarg error, because the new process cannot access a private set.
Ordered sets
Orderet sets are a little different in capabilities than ordinary sets: like the name says, they are able to list keys in their lexical order. The cost of this feature is that access is logarithmic in complexity instead of requiring a constant time. In the underlying implementation, a tree is created instead of an hash table to insert values in order.
sets_can_be_ordered_test() -> ets:new(ordered_movies, [ordered_set, named_table]), ets:insert(ordered_movies, {"Terminator", 1984}), ets:insert(ordered_movies, {"Star Wars", 1977}), ?assertEqual("Star Wars", ets:first(ordered_movies)), ?assertEqual("Terminator", ets:next(ordered_movies, "Star Wars")).
The api works by listing for you the first key of the ordered set (ets:first/1) and by telling you the next key with respect to any other (ets:next/2). The rest of the API, such as insertion and deletion, works in the same way as for non-ordered sets.
Querying
We couldn't talk about tables and data structures without talking about querying them on the occasion. Querying ETS features projections (selecting only some columns) and filtering (selecting only some elements):
querying_is_possible_on_single_fields_test() -> Movies = ets:new(movies, [set]), ets:insert(Movies, {"Terminator", 1984, "Cameron"}), ets:insert(Movies, {"Star Wars", 1977, "Lucas"}), ets:insert(Movies, {"Name", 1999, "A"}), ?assertEqual([["Star Wars"]], ets:match(Movies, {'$0', 1977, '_'})).
The syntax of ets:match/2 features a tuple composed of literal values (1977) and special strings.
The special strings may be used for specifying the selection of a field of the elements ('$0') or to signal we don't care about one of them ('_').
The result is a list of lists:
- the main list is composed of all the elements that match the tuple.
- Each element is represented as a list, as its field count may be manipulated with dynamic calls.
Matching also comprehends deleting and more complex match specification, where operations may be evaluated on fields to determine their inclusion. Take a look at the textbook for this little course, O'Reilly Erlang Programming, which has taken me from just knowing how to write functions to a great knowledge of Erlang's unique features.
Opinions expressed by DZone contributors are their own.
Comments