Obtain period midpoints and average daily rates for count data
period_averager.Rd
Obtain period midpoints and average daily rates for count data
Usage
period_averager(
data,
count_col = "cases_this_period",
start_col = "period_start_date",
end_col = "period_end_date",
norm_col = NULL,
norm_const = 1e+05,
keep_raw = TRUE,
keep_cols = names(data)
)
Arguments
- data
Data frame with rows at minimum containing period start and end dates and a count variable.
- count_col
Character, name of count data column.
- start_col
Character, name of start date column.
- end_col
Character, name of end date column.
- norm_col
Character, name of column giving data for normalization. A good option is often
population_reporting
, which is a column in many datasets containing the total size of the reference population for the count data. To avoid normalization setnorm_col
toNULL
, which is the default.- norm_const
Numeric value for multiplying the
daily_rate
column if anorm_col
is supplied. By default this is1e5
, which corresponds todaily_rate
having units ofcount per day per 100,000 individuals
if thenorm_col
represents the reference population size.- keep_raw
Logical value indicating whether to force all
*_col
columns in the output, even if they are not specified inkeep_cols
, and to place them at the beginning of the columns list. The default isTRUE
.- keep_cols
Character vector containing the names of columns in the input
data
to retain in the output. All columns are retained by default.
Value
Data frame containing the following fields.
Columns from the original dataset specified using
keep_raw
andkeep_cols
.year
: Year of theperiod_start_date
.num_days
: Length of the period in days from the beginning of theperiod_start_date
to the end of theperiod_end_date
.period_mid_time
: Timestamp of the middle of the period.period_mid_date
: Date containing theperiod_mid_time
.daily_rate
: Daily count rate, which by default is given bydaily_rate = count_col / num_days
. If the name ofnorm_col
is specified thendaily_rate = norm_const * count_col / num_days / norm_col
. When interpreting these formulas, please keep in mind thatnorm_const
is a numeric constant,num_days
is a derived numeric column, andcount_col
andnorm_col
are columns supplied within the inputdata
object.
Examples
set.seed(666)
data <- data.frame(disease = "senioritis"
, period_start_date = seq(as.Date("2023-04-03"), as.Date("2023-06-05"), by = 7)
, period_end_date = seq(as.Date("2023-04-09"), as.Date("2023-06-11"), by = 7)
, cases_this_period = sample(0:100, 10, replace = TRUE)
, location = "college"
)
period_averager(data, keep_raw = TRUE, keep_cols = c("disease", "location"))
#> period_start_date period_end_date cases_this_period disease location year
#> 1 2023-04-03 2023-04-09 61 senioritis college 2023
#> 2 2023-04-10 2023-04-16 95 senioritis college 2023
#> 3 2023-04-17 2023-04-23 10 senioritis college 2023
#> 4 2023-04-24 2023-04-30 27 senioritis college 2023
#> 5 2023-05-01 2023-05-07 13 senioritis college 2023
#> 6 2023-05-08 2023-05-14 4 senioritis college 2023
#> 7 2023-05-15 2023-05-21 11 senioritis college 2023
#> 8 2023-05-22 2023-05-28 32 senioritis college 2023
#> 9 2023-05-29 2023-06-04 49 senioritis college 2023
#> 10 2023-06-05 2023-06-11 2 senioritis college 2023
#> num_days period_mid_time period_mid_date daily_rate
#> 1 7 2023-04-06 12:00:00 2023-04-06 8.7142857
#> 2 7 2023-04-13 12:00:00 2023-04-13 13.5714286
#> 3 7 2023-04-20 12:00:00 2023-04-20 1.4285714
#> 4 7 2023-04-27 12:00:00 2023-04-27 3.8571429
#> 5 7 2023-05-04 12:00:00 2023-05-04 1.8571429
#> 6 7 2023-05-11 12:00:00 2023-05-11 0.5714286
#> 7 7 2023-05-18 12:00:00 2023-05-18 1.5714286
#> 8 7 2023-05-25 12:00:00 2023-05-25 4.5714286
#> 9 7 2023-06-01 12:00:00 2023-06-01 7.0000000
#> 10 7 2023-06-08 12:00:00 2023-06-08 0.2857143