Normalize Disease Hierarchy
normalize_disease_hierarchy.RdTake a tidy data set with a potentially complex disease hierarchy
and flatten this hierarchy so that, at any particular time and location
(or some other context), all diseases in the disease column have the
same nesting_disease.
Arguments
- data
A tidy data set with the following minimal set of columns:
disease,nesting_disease,basal_disease,period_start_date,period_end_date, andlocation. Note that the latter three can be modified withgrouping_columns.- disease_lookup
A lookup table with
diseaseandnesting_diseasecolumns that describe a global disease hierarchy that will be applied locally to flatten disease hierarchy at each point in time and space in the tidy data set in thedataargument.- grouping_columns
Character vector of column names to use when grouping to determine the context.
- basal_diseases_to_prune
Character vector of
diseases to remove fromdata.- find_unaccounted_cases
Make new records for instances when the sum of leaf diseases is less than the reported total for their basal disease.
- specials_pattern
Optional regular expression to use to match
diseasenames indatathat should be added to the lookup table. This is useful for disease names that are not historical and produced for harmonization purposes. The most common example is"_unaccounted$", which is the default. Setting this argument toNULLavoids adding any special disease names to the lookup table.