Development Principles and Patterns
Source:vignettes/development_patterns.Rmd
development_patterns.Rmd
Introduction
The macpan2
package uses the standard S3 object-oriented
framework for R
. All objects in macpan2
are
standard R
environments
, with standard S3
class
attributes
. This approach allows us to
integrate with standard generic S3 methods (e.g. print
,
predict
), while retaining the benefits of programming
styles that are common outside of R
such as data-code
bundling and passing by reference. We can get these benefits without
dependencies on third-party packages such as R6
and instead
use standard R
tools, but in an unorthodox yet interesting
(to us) way.
Basics of the macpan2
Object Oriented Framework
Constructing and Using Objects
To understand macpan2
, users and developers need to
understand how to construct objects. Objects in macpan2
have S3
class attributes and are standard R
environment
s with some additional restrictions. Before
looking at those restrictions, we will illustrate the basic idea with an
example.
The macpan2
package comes with a set of
vignette("example_models", package = "macpan2")
. One simple
example is an SIR model stored here.
(sir_path = system.file("model_library", "sir_vax", package = "macpan2"))
#> [1] "/home/runner/work/_temp/Library/macpan2/model_library/sir_vax"
This path points to a directory with the following files.
list.files(sir_path)
#> [1] "derivations.json" "flows.csv"
#> [3] "README.md" "settings.json"
#> [5] "trans.csv" "transmission_dimensions.csv"
#> [7] "transmission_matrices.csv" "variables.csv"
We could read in one of these files using standard R methods, but
here we illustrate objects by using the CSVReader
function.
This function returns an object of class CSVReader
.
sir_flows_reader = CSVReader(sir_path, "flows.csv")
sir_flows_reader
#> Classes 'CSVReader', 'Reader', 'Base' <environment: 0x562e5f28ef68>
#> read : function ()
#> read_base : function ()
We get a print out of the S3
class of the object, which
is a vector with three items.
class(sir_flows_reader)
#> [1] "CSVReader" "Reader" "Base"
Why are there three items? This is the standard S3
method of inheritance. If this doesn’t make sense, it doesn’t matter
now.
We also saw that this item is just an R
environment
. More about this later.
The more important thing is that there is a read
function listed. This function is a method, but importantly this is not
a standard S3
method. Why is it not an S3
method? You guessed it, we will cover that below.
We call methods, like the read()
method, in the
following way.
sir_flows_reader$read()
#> from to flow type from_partition to_partition
#> 1 S I infection per_capita Epi Epi
#> 2 I R gamma per_capita Epi Epi
#> 3 S.unvax S.vax vaccination per_capita Epi.Vax Epi.Vax
#> 4 S S.unvax birth per_capita_inflow Epi Epi.Vax
#> 5 I S.unvax birth per_capita_inflow Epi Epi.Vax
#> 6 R S.unvax birth per_capita_inflow Epi Epi.Vax
#> 7 S death per_capita_outflow Epi Null
#> 8 I death per_capita_outflow Epi Null
#> 9 R death per_capita_outflow Epi Null
#> flow_partition from_to_partition from_flow_partition to_flow_partition
#> 1 Epi Vax Vax Null
#> 2 Epi Vax Vax Null
#> 3 Vax Null
#> 4 Epi Null
#> 5 Epi Null
#> 6 Epi Null
#> 7 Epi Null Null
#> 8 Epi Null Null
#> 9 Epi Null Null
This sir_flows_reader
has read in the CSV file that it
is configured to read.
This syntax might look strange to many R users, who will be used to something more like this.
read.csv(file.path(sir_path, "flows.csv"))
#> from to flow type from_partition
#> 1 S I infection per_capita Epi
#> 2 I R gamma per_capita Epi
#> 3 S.unvax S.vax vaccination per_capita Epi.Vax
#> 4 S S.unvax birth per_capita_inflow Epi
#> 5 I S.unvax birth per_capita_inflow Epi
#> 6 R S.unvax birth per_capita_inflow Epi
#> 7 S death per_capita_outflow Epi
#> 8 I death per_capita_outflow Epi
#> 9 R death per_capita_outflow Epi
#> to_partition flow_partition from_to_partition from_flow_partition
#> 1 Epi Epi Vax Vax
#> 2 Epi Epi Vax Vax
#> 3 Epi.Vax Vax
#> 4 Epi.Vax Epi
#> 5 Epi.Vax Epi
#> 6 Epi.Vax Epi
#> 7 Null Epi Null
#> 8 Null Epi Null
#> 9 Null Epi Null
#> to_flow_partition
#> 1 Null
#> 2 Null
#> 3 Null
#> 4 Null
#> 5 Null
#> 6 Null
#> 7 Null
#> 8 Null
#> 9 Null
The reason why sir_reader$read()
works without any
arguments is that the path to read is stored in the
sir_reader
object. We can see (almost) everything stored in
an object using the ls
function.
ls(sir_flows_reader)
#> [1] "file" "read" "read_base"
Here we see that, along with the read
method, there is
something else called file
, which is just the file path
that the reader is configured to read.
sir_flows_reader$file
#> [1] "/home/runner/work/_temp/Library/macpan2/model_library/sir_vax/flows.csv"
Object components like this file
component, which are
not methods, are called fields.
That’s the basic idea of how to use objects in macpan2
.
Here’s a summary.
- Objects are standard
R
environment
s - Objects have standard
R
S3class
es - Objects have fields and methods
- Object methods are functions that can make use of other object components
Defining Classes
To define a class we write a function called a constructor. We have
aleady seen a constructor – the CSVReader
function above.
Let’s make our own. Let’s make a class that can generate sequences of
numbers.
To warm up we will create a class that does nothing and contains nothing, but which illustrates the basic boilerplate code for creating a class.
DoesNothing = function() {
self = Base()
return_object(self, "DoesNothing")
}
does_nothing = DoesNothing()
does_nothing
#> Classes 'DoesNothing', 'Base' <environment: 0x562e608270f8>
The first line in this constructor uses the Base
function to create an environment
called self
.
The second line sets self
s S3 class
to
DoesNothing
and returns this newly created S3 object.
To make this class more interesting we store an integer as a field.
DoesNothing = function(n) {
self = Base()
self$n = n ## save value of the argument in the object
return_object(self, "DoesNothing")
}
does_nothing = DoesNothing(10)
does_nothing
#> Classes 'DoesNothing', 'Base' <environment: 0x562e60aa5690>
This looks identical to the first version, but now we have stored a
value for n
.
does_nothing$n == 10
#> [1] TRUE
Finally we add a method so that we can do something, and change the name do describe what it can do.
SimpleSequence = function(n) {
self = Base()
self$n = n
self$generate = function() seq_len(self$n)
return_object(self, "SimpleSequence")
}
simple_sequence = SimpleSequence(10)
simple_sequence$generate()
#> [1] 1 2 3 4 5 6 7 8 9 10
Notice that all fields and methods stored in self
(e.g. self$n
) can be used in methods by using the
$
operator to extract the value of the field or method from
self
. The technical reason why this works is that the
self
environment
is in the
environment
of every method in the self
environment
. In fact, the self
environment
is the only thing in the
environment
of each method. If this seems mind-bending,
don’t worry about it.
Those are the basics of class definitions. Here’s a summary.
- Class definitions are functions for constructing objects of that class
- The first thing to do in a class definition is create the
self
environment
- The last thing to do in a class definition is return the
self
environment
as an S3 object - In the middle of a class definition one adds methods and fields to
the
self
environment
- The
self
environment
is the only thing in theenvironment
s of the methods inself
(don’t worry about it)
Details
Objects
In macpan2
, objects are standard R
environment
s with an S3 class
attribute.
Therefore, our object oriented style involves only basic foundational
R
concepts: environment
s and S3 classes.
There are two types of environment
s in this setup. The
first kind of environment
is the
- An S3
class
attribute
- The
environment
of every function in thisenvironment
is an
Class Definitions
Developers can define a class by defining a standard R function that returns an instance of that class.
We talked a bit about technical details that you shouldn’t worry
about in the basics of defining classes. But there is one technicality
that you should worry about. Objects created in a constructor can only
be used in methods if they are accessible through the self
environment
. So for example, the following code fails.
BadClass = function() {
self = Base()
x = 10
self$f = function() x^2
return_object(self, "BadClass")
}
try(BadClass()$f())
#> Error in BadClass()$f() : object 'x' not found
This is good because it forces you to be specific about where method dependencies are coming from. What would have been worse is if the above code succeeded in the following way.
x = 10
BadClass()$f()
#> [1] 100
Why did this ‘work’ now? It doesn’t matter because you will never
have this problem if you just always refer to self
explicitly in methods. In particular, the proper approach would be the
following.
GoodClass = function() {
self = Base()
self$x = 10
self$f = function() self$x^2
return_object(self, "GoodClass")
}
GoodClass()$f()
#> [1] 100
Principles
There will be trade-offs among these principles, but they are good guidelines.
Small Classes
You should be able to see the whole constructor definition on a single screen – it is OK if it doesn’t happen though.
Avoid Modifying Well-Tested Classes
Extension is better done by introducing new classes, rather than new methods. Big classes are hard to reason about, test, and stabilize.
Shallow Inheritance Hierarchy
Parent classes may have multiple children, but in these cases the
hierarchy should be shallow and simple. For example, consider
alternatives if some children inherit directly from an intermediate
parent. When things like this start to happen, it is usually best to
just extend the intermediate parent so that it can inherit directly from
the Base
class.
Balance Regeneration with Consistency
A naive approach to keeping the components of objects consistent is to regenerate the object with every change. But continual regeneration can be expensive. It is best to avoid this trade-off as much as possible by making fields that are cheap to compute into methods that always recompute what the user is asking for. But some fields are too expensive to regenerate and therefore need to be stored and only regenerated when necessary.
Patterns
Here are some design patterns for complying with these principles.
Alternative Classes
Alternative versions of a class have the same set of methods as the
initial version. It needs change then it becomes easy to swap out one
alternative for another. For example, the Reader()
classes
all have a single method – $read()
– without arguments.
Therefore, any bit of functionality that requires data to be read in can
be modified simply by writing a new reader and swapping it in for the
old one, without needing to modify any of the code that calls the
$read()
method. The methods in alternative classes should
return the same type of object, but obviously the return value itself
can and should vary.
Argument Fields
These are the simplest kinds of object components, and essentially behave as lists. Argument fields store arguments to the constructor. For example here is an object with two argument fields.
A = function(x, y) {
...
self$x = x
self$y = y
...
}
These fields can be accessed using the standard $
or
[[
operators.
a = A(x = 10, y = 20)
a$x == 10 ## TRUE
a$y == 20 ## TRUE
Note that although it is possible to set such fields, it is not
recommended. Rather one should use $refresh()
methods as
described below.
Static Fields
Static fields store values derived from arguments to the constructor. Static fields are similar to argument fields, but they contain derived quantities that depend on the arguments rather than the arguments themselves. A simple example is to store the sum of two arguments in a static field.
A = function(x, y) {
...
self$z = x + y
...
}
Note that static fields may need to be updated by
$refresh()
methods.
Standard Methods
Standard methods compute and return values derived from arguments to the constructor. These methods should only be used if they are cheap to run, so that regeneration and consistency are balanced. But this pattern is generally the preferred option, because it is simplest to reason about and maintain because it more directly ensures consistency.
Composition
Objects can be composed of other objects. Composition of objects and classes looks like this.
A = function(...) {
...
self$b = B(self)
...
}
...
B = function(a) {
...
self$a = a
...
}
Then other developers and users can do the following.
a = A(...)
a$b$method(...)
This keeps classes small because B
can have methods
instead of A
, and small classes are easier to test and
stabilize. Testing of A
can focus on the methods directly
in A
, and then A
can be extended by composing
new classes like B
.
Refresh Methods
Methods refreshing fields when shallow copies of those fields are in several composed objects … When a field gets edited, the simplest thing to do is
Private Methods
Private methods should only be used by other methods in the class. There is nothing stoping a developer or a user from calling a private method, but there is no guarantee that the private method with have consistent behaviour or even exist. To communicate privacy, private methods should start with a dot as the following example shows.
A = function(...) {
...
self$.private = function(...) {...}
...
self$public = function(...) {
...
self.private(...)
...
}
}
Method Caching
Developers can manage the performance costs of computationally expensive methods through method caching. When a developer calls a cached method for the first time, it computes the result, stores it in a cache, and returns the result. Subsequent method evaluations simply retrieve the cached value, improving efficiency. Developers can ensure consistency by invalidating the cache whenever objects change, allowing them to balance the cost of regeneration with the need for consistency.
A = function(..., method_dependency, ...) {
...
self$method_dependency = method_dependency
...
self$expensive_method_1 = function() {
...
}
...
self$expensive_method_2 = function() {
...
}
...
self$cheap_method = function() {
...
}
...
self$modify_dependency = function(...) {
...
self$cache$expensive_method_1$invalidate()
self$cache$expensive_method_2$invalidate()
...
}
...
initialize_cache(self, "expensive_method_1", "expensive_method_2")
...
}
a = A()
# takes time to return
a$expensive_method()
# return immediately by returning the same value computed previously
# and stored in the cache
a$expensive_method()
# change object and invalidate the cache to enforce consistency
a$modify_dependency(...)
# again takes time to return, but the value is different because the
# object was modified
a$expensive_method()