Prerequisites
This document assumes a solid understanding of template model builder.
Matrices
A list of numerical scalars, vectors, and matrices are passed as input variables from R to C++. Scalars are passed as 1-by-1 matrices and vectors as n-by-1, so that all of these numerical variables are actually matrices.
Treating all numerical objects as matrices might seem strange, but it is similar to how R treats all numerical objects as vectors. In this way, R matrices are vectors with dimensions. We are just extending this logic and requiring that all vectors have dimensions. We are more restrictive than R however by not allowing multidimensional arrays.
Missing values are not allowed in any matrix. On the R side these variables are all associated with a unique name, and on the C++ side each of these names is associated with a 0-based index. All indices in this spec are 0-based.
All matrices must have zero or more rows and columns. Any matrix with either zero rows or zero columns is an empty matrix. Empty matrices allow for placeholders for matrices that will be defined during simulations.
These matrices are passed in using the DATA_STRUCT(mats)
TMB macro, with an associated C++ struct
for extracting the
component matrices by index.
Parameters
Two vectors of parameters are passed to C++.
-
PARAMETER_VECTOR(params)
– this vector becomes the argument of the objective function, and can therefore be optimized using a non-linear optimizer or simulated using MCMC -
PARAMETER_VECTOR(random)
– this vector becomes the random effects that will be integrated out of the objective function using a Laplace approximation
The values in these parameter vectors are used to update certain
elements in the matrix-valued variables within mats
. The
elements to be updated are described on the R side by two data frames
with one row for every matrix element to be replaced. These data frames
are constructed in R and passed to C++. The columns for the data frame
associated with params
are the following.
-
DATA_IVECTOR(p_par_id)
– indices into theparams
vector giving the parameter to use when updating each element inmats
that is to be updated -
DATA_IVECTOR(p_mat_id)
– indices intomats
giving the matrices with elements to be replaced by parameters -
DATA_IVECTOR(p_row_id)
– indices of the rows within each matrix associated with an element to be replaced by parameters -
DATA_IVECTOR(p_col_id)
– indices of the columns within each matrix associated with an element to be replaced by parameters
After these vectors are read into C++, a loop is executed over the
rows of this table that replaces the associated mats
elements with params
elements.
The random
vector is treated similarly to
params
in that it is associated with a data frame
describing what elements in mats
should be replaced. The
names of the columns of this table are:
DATA_IVECTOR(r_par_id)
DATA_IVECTOR(r_mat_id)
DATA_IVECTOR(r_row_id)
DATA_IVECTOR(r_col_id)
Our implied convention is that p
stands for ‘fixed
Parameters’ and r
for ‘Random parameters’.
Trajectory Simulation
After the input matrices in mats
are updated using
params
and random
, these matrices can be
modified. We refer to this process of modification as trajectory
simulation. There are three phases to the trajectory simulation
process.
- Before the simulation loop
- During the simulation loop
- After the simulation loop
Simulation time is measured in dimensionless iterations, and is
indexed by an integer, t
, such that
0 <= t <= T+1
, where T
is the number of
iterations of the loop. The value of each matrix at t = 0
is the value of that matrix just before the first iteration of the
simulation loop begins. The value of each matrix at
0 < t < T+1
is the value of the matrix at the very
end of the t
th iteration of the simulation loop. The value
of each matrix at t = T+1
is the value of the matrix at the
end of the simulation.
This time-indexing system is used for two purposes.
- To optionally return the simulation history to the user
- For matrix modifications that depend on past values
The user can opt in and out of these uses on a per-matrix basis, by specifying two vectors that have one element per matrix.
-
DATA_IVECTOR(mats_save_hist)
- Equals
0
if the matrix is overwritten at each simulation iteration,t
- Equals
1
if all computed values of the matrix are saved
- Equals
-
DATA_IVECTOR(mats_return)
- Equals
0
if the matrix is not returned to the R side at the end of the simulation - Equals
1
otherwise
- Equals
The number of iterations, T
, is passed to TMB as
DATA_INTEGER(time_steps)
.
Expressions
The mathematical details of how matrices are modified during the simulation process is controlled on the R side by supplying expressions. Each expression is the right-hand-side of an R formula involving the following three types of objects.
- Names of matrices in
mats
- Names of functions that are currently allowed by the engine
- Numeric literals (e.g.
3.14
)
A simulation is the sequential evaluation of these expressions in a
user-specified order. Each expression may be evaluated during one of the
three simulation phases – before, during, and after the simulation loop.
Expressions evaluated before and after the simulation loop are evaluated
only once, whereas those evaluated during the loop are evaluated at
every iteration. The phase of each expression is controlled by the
DATA_IVECTOR(eval_schedule)
vector described at the end of
this section.
Each mathematical expression can be used by C++ in several ways. Information on how each expression should be used is passed to C++ using a set of vectors. Each of these vectors is the same length, with one element per expression.
-
DATA_IVECTOR(expr_output_id)
- Index into
mats
identifying the matrix produced by the expression
- Index into
-
DATA_IVECTOR(expr_sim_block)
- Identifies whether or not the expression should be evaluated inside
a
SIMULATE
macro within TMB - A value of
0
indicates that the expression is to be evaluated without theSIMULATE
macro (we expect this to be the standard case), whereas a value of1
indicates evaluation inside aSIMULATE
macro - Note that if an expression is evaluated in a
SIMULATE
macro and is returned to the user throughmats_return == 1
, then it must also be returned within theSIMULATE
macro
- Identifies whether or not the expression should be evaluated inside
a
-
DATA_IVECTOR(expr_num_p_table_rows)
- Number of rows associated with each expression in the parse table (see section on Parse Tables)
Each expression is evaluated in the order in which it appears in these vectors.
The DATA_IVECTOR(eval_schedule)
vector gives the phase
in which each of these expressions should be evaluated. This vector has
three elements giving the number of expressions to evaluate before,
during, and after the simulation loop. In particular the first
eval_schedule[0]
expressions are evaluated before the
simulation loop, the next eval_schedule[1]
expressions are
evaluated at every iteration of the simulation loop, and the next
eval_schedule[2]
expressions are evaluated after the
simulation loop.
Inputs are invalid if the sum of the elements of
eval_schedule
does not equal the number of elements in each
of expr_output_id
, expr_sim_block
, and
expr_num_p_table_rows
.
Parse Tables
Each expression is parsed into a table of numbers that represents the
expression and can be passed to C++. Each row in this table corresponds
to a step in the process of evaluating the expression. These steps
correspond to one of three types of things, identified by column
n
:
- A function – if
n > 0
– see section on Function Definitions - A matrix – if
n == 0
– see section on Matrices - A literal – if
n == -1
– see section on Literals
If the row does correspond to a function, then column n
gives the number of arguments in that function.
The column, x
, gives an index for looking up the
specific instance of each of these three types of entities. For example,
if n == 0
then the x
column gives the index
into the mats
list for getting the appropriate matrix If
n == -1
then x
gives an index into
literals
and if n > 0
x
gives
an index into a list of valid functions. The i
column is
only relevant for functions, and indicates the row of the table
representing the first argument to that function.
This table is processed on the C++ side with a recursive function that either:
- for rows associated with functions: looks up a valid function from a list of function definitions, and recursively calls itself
- for rows associated with matrices: looks up and returns a matrix in
the
mats
list - for rows associated with literals: looks up and returns a literal from a list of valid literals
The parse tables for all expressions are concatenated row-wise and passed to C++ as a set of three vectors of equal length.
DATA_IVECTOR(p_table_n)
DATA_IVECTOR(p_table_x)
DATA_IVECTOR(p_table_i)
The first three vectors correspond to n
, x
,
and i
as discussed in this section.
The expr_num_p_table_rows
vector (defined above in the
section on Expressions) is used to relate
each expression to a set of rows in this concatenated parse table. The
elements of this vector contain the number of parse table rows
associated with each expression. The ordering of the elements is
consistent with the ordering of the concatenation of the individual
parse tables, and so row indices are not necessary.
Literals
A global list of valid literals for all expressions is passed to C++
as a numeric vector, DATA_VECTOR(literals)
.
Function Definitions
The functions that are used in each [Expression] must be in a valid list of functions defined on the C++ side. Most of these functions will have analogues on the R side to make it as easy as possible for R users to reason about their expressions.
Valid functions take one or more matrix-valued arguments and return a single matrix. The number of arguments does not need to be known when the model is defined, but some functions may optionally require a predefined number of arguments.
Extending functionality of the engine will typically involve simply adding function definitions to the list of valid functions. All function definitions have the following objects available to them.
-
r
– A list of matrices giving the arguments to the function (e.g.r[0]
returns the first matrix). -
index2mats
– A list of integers giving the indices for identifying the matrices in themats
list (e.g.index2mats[0]
returns the index of the first argument). -
t
– Time step -
hist
– A list of lists of matrices giving the history of the simulation (e.g.hist[4][0]
returns the value of the first matrix at time step4
). -
n
– The number of arguments.
Function definitions should be grouped into types if several
functions require similar processing. The main example that we have is
element-wise binary operators, including +
, *
,
-
, /
, ^
. These functions all
require the same pre-processing to make sure that the matrix dimensions
of the two operands are compatible and that compatibility is
conveniently defined for the user.
Objective Function
The return value of the objective function is an expression that could depend on the values of all the matrices at the end of the simulation and the entire saved simulation history. This expression is passed as a parse table.
DATA_IVECTOR(o_table_n)
DATA_IVECTOR(o_table_x)
DATA_IVECTOR(o_table_i)
The implied convention here is that o
is for ‘Objective
function parse table’ and p
is for ‘matrix Parse
table’.