Normalize Disease Hierarchy
normalize_disease_hierarchy.Rd
Take a tidy data set with a potentially complex disease hierarchy and flatten this hierarchy so that, at any particular time and location (or some other context), all diseases in the `disease` column have the same `nesting_disease`.
Arguments
- data
A tidy data set with the following minimal set of columns: `disease`, `nesting_disease`, `basal_disease`, `period_start_date`, `period_end_date`, and `location`. Note that the latter three can be modified with `grouping_columns`.
- disease_lookup
A lookup table with `disease` and `nesting_disease` columns that describe a global disease hierarchy that will be applied locally to flatten disease hierarchy at each point in time and space in the tidy data set in the `data` argument.
- grouping_columns
Character vector of column names to use when grouping to determine the context.
- basal_diseases_to_prune
Character vector of `disease`s to remove from `data`.
- find_unaccounted_cases
Make new records for instances when the sum of leaf diseases is less than the reported total for their basal disease.
- specials_pattern
Optional regular expression to use to match `disease` names in `data` that should be added to the lookup table. This is useful for disease names that are not historical and produced for harmonization purposes. The most common example is `"_unaccounted$"`, which is the default. Setting this argument to `NULL` avoids adding any special disease names to the lookup table.