Skip to contents

Take a tidy data set with a potentially complex disease hierarchy and flatten this hierarchy so that, at any particular time and location (or some other context), all diseases in the disease column have the same nesting_disease.

Usage

normalize_disease_hierarchy(
  data,
  disease_lookup,
  grouping_columns = c("period_start_date", "period_end_date", "location"),
  basal_diseases_to_prune = character(),
  find_unaccounted_cases = TRUE,
  specials_pattern = "_unaccounted$"
)

Arguments

data

A tidy data set with the following minimal set of columns: disease, nesting_disease, basal_disease, period_start_date, period_end_date, and location. Note that the latter three can be modified with grouping_columns.

disease_lookup

A lookup table with disease and nesting_disease columns that describe a global disease hierarchy that will be applied locally to flatten disease hierarchy at each point in time and space in the tidy data set in the data argument.

grouping_columns

Character vector of column names to use when grouping to determine the context.

basal_diseases_to_prune

Character vector of diseases to remove from data.

find_unaccounted_cases

Make new records for instances when the sum of leaf diseases is less than the reported total for their basal disease.

specials_pattern

Optional regular expression to use to match disease names in data that should be added to the lookup table. This is useful for disease names that are not historical and produced for harmonization purposes. The most common example is "_unaccounted$", which is the default. Setting this argument to NULL avoids adding any special disease names to the lookup table.